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Long-term  Research  Objective: 

The  physical  basis  for  biological  complexity  is  context-dependent  expression  of  the  organism’s 
genome.  The  context  is  provided  by  the  life  cycle  of  the  organism;  the  molecular  mechanisms  of 
gene  regulation  interpret  that  context.  We  seek  to  develop  a  theoretical  framework  for 
quantitatively  relating  the  integrated  behavior  of  gene  circuits  to  their  underlying  molecular 
determinants,  and,  by  applying  this  theory  to  specific  classes  of  gene  circuits,  we  hope  to 
discover  the  basic  principles  that  govern  their  design  by  natural  selection. 

S  &  T  Objectives: 

To  the  best  of  our  knowledge,  there  has  been  no  concerted  effort  to  understand  the  role  of  gene 
circuitry  as  a  robust  computational  device.  Under  what  conditions  will  gene  control  elements 
and  their  circuitry  be  maintained  in  the  face  of  mutational  entropy?  What  is  the  computational 
potential  of  such  robust  circuits,  and  can  one  design  selection  strategies  to  direct  their  evolution? 
These  related  questions  represent  the  principal  objectives  of  this  work. 

Approach: 

We  will  first  analyze  known  molecular  modes  of  gene  control  and  the  spectrum  of  computations 
that  they  are  capable  of  performing.  Second,  we  will  determine  the  selective  pressures  that  in  the 
presence  of  mutation  lead  to  the  emergence  and  maintenance  of  particular  computational  circuits 
involving  these  elements.  Finally,  we  will  use  all  this  information  in  an  attempt  to  design 
specific  computational  solutions  through  a  process  of  directed  evolution. 

S  &  T  Completed: 

Analysis  of  Alternative  Molecular  Modes  of  Gene  Control  And  of  The  Selective  Pressures 
That  Lead  to  The  Emergence  And  Maintenance  of  Particular  Computational  Circuits 

We  have  refined  our  development  of  the  quantitative  implications  of  demand  theory.  A 
summary  of  the  preliminary  results  that  were  obtained  in  the  first  year  of  the  grant  appeared  as  a 
book  chapter  [1].  The  detailed  development  of  the  theory  and  applications  with  refined 
parameter  estimates  were  presented  in  two  Genetics  papers.  In  the  first  paper  [2],  a  theory  is 
developed  that  ties  together  a  number  of  important  variables,  including  growth  rates,  mutation 
rates,  minimum  and  maximum  demands  for  gene  expression,  and  minimum  and  maximum 
durations  for  the  life  cycle  of  the  organism.  Applications  of  the  theory  are  provided  in  the 
second  paper  [3],  where  regulation  of  the  lactose  and  maltose  operons  of  Escherichia  coli  is 
analyzed  and  the  results  are  compared  with  independent  experimental  data. 

The  quantitative  development  of  demand  theory  not  only  confirms  and  quantifies  the  previous 
qualitative  predictions,  but  it  also  identifies  critical  factors  and  reveals  new  relationships  [2]. 
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The  recursive  equations  that  characterize  the  population  dynamics  of  mutant  and  wild-type 
organisms  allow  one  to  predict  the  time  course  for  selection.  The  form  of  these  equations  also 
allows  one  to  predict  that  the  response  time  for  selection  is  independent  of  the  ON/OFF  cycle 
time  for  the  gene  C,  whereas  it  is  strongly  dependent  upon  the  demand  for  gene  expression  D. 
The  steady-state  solution  of  the  recursive  equations  provides  estimates  for  the  extent  of  selection. 
A  threshold  for  selection  is  determined  by  the  relationship  between  cycle  time  C  and  demand  D 
that  results  when  the  extent  of  selection  is  set  equal  to  the  criterion  for  selection. 

The  thresholds  for  selection  in  the  C  vs.  D  plot  define  regions  within  which  selection  of  the 
positive  or  negative  mode  of  regulation  is  realizable.  Their  intersection  defines  a  maximum 
value  for  the  cycle  time  Cmax,  and  their  asymptotes  define  minimum  Dmin  and  maximum 
Dmax  values  of  the  demand  for  gene  expression.  These  regions  also  exhibit  an  inherent 
asymmetry  that  favors  selection  of  the  positive  mode. 

In  summary,  the  quantitative  development  of  demand  theory  reveals  unexpected  relationships 
between  the  demand  for  gene  expression  D  and  the  average  ON/OFF  cycle  time  for  the  gene  C, 
which  is  a  manifestation  of  the  organism’s  life  cycle.  The  demand  theory  of  gene  regulation  can 
be  extended  within  the  framework  presented  here  to  include  organisms  with  life  cycles  that  are 
more  complex  than  the  two  phases  illustrated  in  this  paper  and  regulatory  systems  that  are  more 
complex  than  a  single  mechanism  of  gene  control. 

The  application  of  demand  theory  to  the  lactose  and  maltose  operons  of  E.  coli  provides  an 
opportunity  to  test  a  number  of  the  theory's  quantitative  implications  [3]. 

With  the  parameter  values  that  represent  the  lactose  and  maltose  operons,  the  time  required  to 
reach  full  selection  is  independent  of  cycle  time  but  decreases  until  a  minimum  is  reached  with 
increasing  demand  (negative  mode)  or  decreasing  demand  (positive  mode).  On  the  other  hand, 
the  extent  of  selection  is  dependent  on  the  value  of  C  and  increases,  reaches  a  maximum,  and 
then  declines  as  demand  increases.  The  combination  of  these  results  suggests  that  the  optimum 
extent  and  rate  of  selection  occurs  at  around  Z)=0.001  for  the  negative  mode  and  7-Z)=0.01  for 
the  positive  mode.  In  the  case  of  the  positive  mode  this  represents  a  choice  of  1-D  that  yields  a 
rate  of  selection  that  is  nearly  equivalent  to  the  optimum  for  the  negative  mode. 

The  allowed  regions  for  selection  permit  one  for  the  first  time  to  specify  precisely  what  is  meant 
by  high  and  low  demand.  With  the  nominal  values  for  the  parameters  of  the  lactose  and  maltose 
operons  in  E.  coli,  selection  of  the  negative  mode  of  control  requires  a  demand  between 
0.000005  and  0.1,  whereas  selection  of  the  positive  mode  requires  a  demand  between  0.2  and 
0.999985.  Furthermore,  these  regions  exhibit  the  predicted  asymmetry  with  the  positive  mode 
having  the  larger  region  within  which  selection  is  realizable. 

The  quantitative  theory  reveals  a  number  of  new  relationships  involving  cycle  time  that  can  be 
tested  against  experimental  data  in  the  case  of  the  lactose  and  maltose  operons  of  E.  coli.  The 
first  such  relationship  provides  an  estimate  for  the  minimum  value  of  the  cycle  time  Cmin •  We 
obtained  values  of  26  hours  and  10  hours,  which  is  on  the  same  order  of  magnitude  as  the  40 
hours  required  on  average  for  transit  through  the  entire  intestinal  tract.  Under  these 
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circumstances,  E.  coli  is  simply  passing  through  the  intestinal  tract  without  colonizing  the  colon. 
Clearly,  the  cycle  time  can  be  no  shorter  than  this  period. 

The  second  relationship  provides  an  estimate  for  the  maximum  value  of  the  cycle  time  Cmax- 
We  have  estimated  this  value  to  be  approximately  580,000  hours  (—66  years)  in  the  case  of  the 
lactose  operon  and  502,000  hours  (~  57  years)  in  the  case  of  the  maltose  operon.  These  values 
for  Cmax  are  on  the  same  order  of  magnitude  as  the  120-year  maximum  for  the  life  span  of 
humans.  Clearly,  the  cycle  time  for  E.  coli  can  be  no  longer  than  the  life-time  of  the  host 
because  the  bacteria  will  die  with  the  host  if  they  do  not  colonize  a  new  host. 

The  final  relationship  provides  an  estimate  for  the  optimal  value  of  the  cycle  time  C0p-  The 
optimal  extent  and  rate  of  selection  determined  for  the  lactose  operon  suggest  a  demand  in  the 
neighborhood  of  D0p  =  0.001.  This  value  of  D,  taken  together  with  the  measured  three-hour 
exposure  time  to  lactose  in  humans  ( D=3/Q ,  predicts  an  optimal  cycle  time  of  C0p  -  3000 
hours  (~  4  months).  The  corresponding  estimate  based  on  the  maltose  operon  is  C0p  =  800 
hours  (~  33  days).  These  predicted  values  for  the  cycle  time  of  E.  coli  are  comparable  with  the 
cycle  times  (recolonization  rates)  of  months  to  years  that  have  been  observed  in  humans  for 
resident  strains  of  E.  coli. 

In  summary,  the  quantitative  development  of  demand  theory  presented  in  the  first  paper  and 
applied  in  the  second  provides  the  first  estimates  for  the  high  and  low  values  of  demand  that  are 
required  for  selection  of  the  positive  and  negative  modes  of  gene  control.  The  specific 
application  to  the  maltose  and  lactose  operons  of  E.  coli  suggests  that  the  positive  and  negative 
modes  of  control  for  these  genes  are  subject  to  selection  throughout  the  full  range  of  cycle  times 
that  are  possible  for  this  microbe.  Moreover,  the  cycle  times  predicted  on  the  basis  of  the 
optimal  extent  and  rate  of  selection  are  in  agreement  with  the  typical  cycle  times  that  have  been 
observed  experimentally. 

The  quantitative  version  of  demand  theory  integrates  information  at  the  level  of  DNA  (mutation 
rate,  effective  target  sizes  for  mutation  of  regulatory  proteins,  promoter  sites,  and  modulator 
sites),  physiology  (selection  coefficients  for  superfluous  expression  of  an  unneeded  function  and 
for  lack  of  expression  of  an  essential  function),  and  ecology  (environmental  context  and  life 
cycle)  and  makes  rather  surprising  predictions  connected  to  the  intestinal  physiology  and  life 
span  of  the  host  and  to  the  rate  for  recolonizing  the  host.  Two  additional  approaches  have 
yielded  results  that  provide  new  insight  into  the  normal  physiological  operation  of  this  gene 
circuit  in  the  organism’s  natural  environment. 

First,  we  have  extended  our  initial  analysis  to  include  the  AND  logic  by  which  expression  of  the 
lactose  operon  is  determined  by  the  absence  of  the  lactose-specific  repressor  and  the  presence  of 
the  global  CAP  activator  [4],  When  the  logic  of  combined  control  by  CAP-cAMP  activator  and 
Lac  repressor  was  analyzed,  we  found  an  optimum  set  of  values  not  only  for  the  exposure  to 
lactose,  but  also  for  the  exposure  to  glucose  and  for  the  relative  phasing  between  the  periods  of 
exposure. 
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Second,  we  also  have  analyzed  the  alternative  types  of  switching  behavior  that  can  be  exhibited 
by  the  lactose  operon  in  Escherichia  coli  [5].  The  analysis  showed  that  slight  changes  in  the 
connectivity  of  the  circuit  (made  possible  by  the  used  of  metabolic  analogues)  can  alter  the 
switching  behavior  from  a  continuous  graded  response  to  a  discontinuous  all-or-none  expression 
of  the  lac  operon. 

Distinguish  Design  Features  That  Occur  as  the  Result  of  Natural  Selection  from  Those  That 
Occur  With  High  Probability  by  Chance 

We  have  been  comparing  alternative  designs  for  achieving  the  same  function  in  an  effort  to 
identify  those  designs  that  would  arise  frequently  by  chance  and  that  would  be  robust  in  the  face 
of  mutational  entropy.  Any  method  to  identify  the  critical  differences  between  alternative 
designs  for  gene  circuitry  must  be  able  to  distinguish  those  design  features  that  are  the  result  of 
natural  selection  from  those  that  occur  with  high  probability  by  chance.  The  principal  method 
we  have  used  for  most  of  our  studies  is  the  method  of  mathematically  controlled  comparisons. 

The  method  involves  the  following  steps.  (1)  The  alternatives  being  compared  are  restricted  to 
having  differences  in  a  single  specific  process  that  remains  embedded  within  its  natural  milieu. 
(2)  The  values  of  the  parameters  that  characterize  the  unaltered  processes  of  one  system  are 
assumed  to  be  strictly  identical  with  those  of  the  corresponding  parameters  of  the  alternative 
system.  This  equivalence  of  parameter  values  within  the  systems  is  called  internal  equivalence. 
It  provides  a  means  of  nullifying  or  diminishing  the  influence  of  the  background,  which  in 
complex  systems  is  largely  unknown.  (3)  Parameters  associated  with  the  changed  process  are 
initially  free  to  assume  any  value.  This  allows  the  creation  of  new  degrees  of  freedom.  (4)  The 
extra  degrees  of  freedom  are  then  systematically  reduced  by  imposing  constraints  on  the  external 
behavior  of  the  systems.  In  this  way,  the  two  systems  are  made  as  nearly  equivalent  as  possible 
in  their  interactions  with  the  outside  environment.  This  is  called  external  equivalence.  (5)  The 
constraints  imposed  by  external  equivalence  fix  the  values  of  the  altered  parameters  in  such  a 
way  that  arbitrary  differences  in  systemic  behavior  are  eliminated.  Functional  differences  that 
remain  between  alternative  systems  with  maximum  internal  and  external  equivalence  constitute 
irreducible  differences.  (6)  When  all  degrees  of  freedom  have  been  eliminated,  and  the 
alternatives  are  as  identical  as  they  can  be,  then  comparisons  are  made  by  rigorous  mathematical 
and  computer  analyses  of  the  alternatives. 

In  some  cases,  the  results  obtained  using  this  technique  are  general  and  independent  of  parameter 
values  and  the  answers  are  clear-cut.  In  others,  the  result  might  be  general,  but  the 
demonstration  is  difficult  and  numerical  results  with  specific  parameter  values  can  help  to  clarify 
the  situation.  In  either  situation,  numerical  results  with  specific  parameter  values  also  can 
provide  an  answer  to  the  question  of  how  much  better  one  of  the  alternatives  might  be.  In 
contrast,  a  more  ambiguous  result  is  obtained  when  either  of  the  alternatives  can  have  the  larger 
value  for  a  given  systemic  property,  depending  on  the  specific  values  of  the  parameters.  In  any 
case,  introduction  of  specific  values  for  the  parameters  reduces  the  generality  of  the  results.  A 
numerical  approach  to  this  problem  has  been  developed  that  combines  the  method  of 
mathematically  controlled  comparison  with  statistical  techniques  to  yield  numerical  results  that 
are  general  in  a  statistical  sense.  These  developments  have  been  reported  in  a  series  of  five 
papers  [6-10]. 


5 


4 


The  first  task  in  developing  a  statistical  version  of  mathematically  controlled  comparisons  was  to 
expand  the  usual  methodology  of  making  statistical  comparisons.  When  dealing  with  questions 
that  concern  a  general  class  of  models  for  biological  networks,  large  numbers  of  distinct  models 
within  the  class  can  be  grouped  into  an  ensemble  that  gives  a  statistical  view  of  the  properties  for 
the  general  class.  However,  comparing  properties  of  different  ensembles  through  the  use  of  the 
usual  point  measures  (e.g.  medians,  standard  deviations,  correlation  coefficients)  can  mask 
inhomogeneities  in  the  correlations  between  properties.  We  were  therefore  motivated  to  develop 
strategies  that  allow  these  inhomogeneities  to  be  more  easily  detected  [6].  First,  we  take 
advantage  of  the  regular  systematic  structure  of  the  power-law  formalism  to  construct  ensembles 
of  parameter  sets  for  both  the  reference  model  and  the  alternative  model  in  question.  Second, 
these  realizations  of  the  two  alternative  model  designs  are  analyzed  to  characterize  their  systemic 
behaviors.  Third,  these  are  then  compared  by  means  of  a  novel  “Density  of  Ratios  Plot”. 
Techniques  involving  moving  quantiles  are  introduced  to  generate  secondary  plots  in  which 
correlations  and  inhomogeneities  in  correlations  are  more  easily  detected.  Finally,  we  provided 
several  examples  to  illustrate  the  advantages  of  these  techniques. 

The  first  uses  of  the  graphical  and  statistical  methods  presented  in  the  previous  paper  were  to 
examine  how  the  different  systemic  properties  of  a  biochemical  network  are  correlated  with  one 
another  and  how  the  specification  of  particular  systemic  properties  biases  the  distribution  of  the 
underlying  parameter  values  [7].  To  keep  the  application  as  transparent  as  possible  we  examined 
a  simple  unbranched  biosynthetic  pathways  subject  to  control  by  feedback  inhibition.  After 
constructing  a  large  ensemble  of  randomly  generated  sets  of  parameter  values,  the  structural  and 
behavioral  properties  of  the  model  with  these  parameter  sets  were  examined  statistically  and 
classified.  The  results  of  our  analysis  demonstrated  that  certain  properties  of  these  systems  are 
strongly  correlated,  thereby  revealing  aspects  of  organization  that  are  highly  probable 
independent  of  selection.  This  is  an  aspect  of  the  usual  approach  whereby  parameter  values  are 
examined  to  determine  their  influence  on  systemic  behavior.  We  have  also  taken  the  opposite 
view  to  learn  how  selection  for  particular  systemic  behaviors  influences  the  frequency 
distribution  of  parameter  values.  Information  on  the  distribution  of  parameter  values  is  of 
interest  in  the  design  of  experiments  to  measure  the  parameters  in  actual  systems.  By  knowing 
the  most  probable  values  of  a  parameter,  one  can  design  experiments  to  target  that  range.  The 
results  would  allow  us  to  make  predictions  about  the  range  of  values  most  likely  to  generate 
systems  with  the  known  behavior. 

Having  developed  the  statistical  methods  and  examined  the  correlations  among  the  properties  of 
a  system,  we  used  these  methods  for  the  specific  purpose  of  extending  the  method  of 
Mathematically  Controlled  Comparison  [8].  This  method  has  been  used  for  some  time  to 
determine  which  of  two  alternative  regulatory  designs  is  better  according  to  specific  quantitative 
criteria  for  functional  effectiveness.  We  were  motivated  to  develop  and  apply  statistical  methods 
that  would  permit  the  use  of  numerical  values  for  the  parameters  and  yet  retain  some  of  the 
generality  that  makes  Mathematically  Controlled  Comparison  so  attractive.  We  illustrated  this 
new  numerical  method  in  a  step-by-step  application  using  a  very  simple  didactic  example.  We 
also  validated  the  results  by  comparison  with  the  corresponding  results  obtained  using  the 
previously  developed  analytical  method.  The  numerical  method  confirmed  the  qualitative 
differences  between  the  systemic  behavior  of  alternative  designs  obtained  from  the  analytical 
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method.  In  addition,  the  numerical  method  allowed  for  quantification  of  the  differences  and  it 
provided  results  that  are  general  in  a  statistical  sense. 

The  statistical  strategy  for  making  mathematically  controlled  comparisons,  after  having  been 
developed  and  validated,  was  used  to  address  two  long-standing  biological  question:  (1)  Why  is 
the  pattern  of  overall  feedback  inhibition  a  nearly  universal  design  feature  of  unbranched 
biosynthetic  pathways?  (2)  Why  do  most  unbranched  biosynthetic  pathways  have  irreversible 
reactions  near  their  beginning,  many  times  at  the  first  step? 

We  approached  the  first  of  these  questions  by  examining  pathways  with  an  arbitrary  pattern  of 
feedback  interactions  and  an  otherwise  equivalent  pathway  without  the  overall  feedback 
interaction  [9].  Our  statistical  method  allowed  the  rigorous  determination  of  the  changes  in 
systemic  properties  that  can  be  exclusively  attributed  to  overall  feedback  inhibition.  Analytical 
results  show  that  the  unbranched  pathway  can  achieve  the  same  steady-state  flux,  concentrations 
and  logarithmic  gains  with  respect  to  changes  in  substrate,  with  or  without  overall  feedback 
inhibition.  The  analytical  approach  also  shows  that  overall  feedback  inhibition  amplifies  the 
regulation  of  flux  by  the  demand  for  end  product  while  attenuating  the  sensitivity  of  the 
concentrations  to  the  same  demand.  This  approach  does  not  provide  a  clear  answer  regarding  the 
effect  of  overall  feedback  inhibition  on  the  robustness,  stability  and  transient  time  of  the 
pathway.  However,  the  generalized  numerical  method  we  used  does  clarify  the  answers  to  these 
questions.  On  average,  an  unbranched  pathway  with  overall  feedback  inhibition  is  less  sensitive 
to  perturbations  in  the  values  of  the  parameters  that  define  the  system.  On  average,  overall 
feedback  inhibition  decreases  the  stability  margins  by  a  minimal  amount  (typically  less  than 
5%).  Finally,  and  again  on  average,  stable  systems  with  overall  feedback  inhibition  respond 
faster  to  fluctuations  in  the  metabolite  concentrations.  Taken  together  these  results  show  that 
overall  feedback  inhibition  confers  several  functional  advantages  upon  unbranched  pathways. 
These  advantages  provide  a  rationale  for  the  prevalence  of  this  mechanism  in  unbranched 
metabolic  pathways  in  vivo. 

We  approached  the  second  of  these  questions  by  systematically  varying  the  position  of  the 
irreversible  reaction  in  model  pathways  and  comparing  the  systemic  behavior  according  to 
several  criteria  for  functional  effectiveness  using  the  method  of  mathematically  controlled 
comparisons  [10].  This  technique  minimizes  extraneous  differences  in  systemic  behavior  and 
identifies  those  that  are  fundamental.  Our  results  show  that  a  pathway  with  an  irreversible 
reaction  located  at  the  first  step,  and  all  other  reactions  reversible,  is  on  average  better  than  an 
otherwise  equivalent  pathway  with  all  reactions  reversible,  which  in  turn  is  on  average  better 
than  an  otherwise  equivalent  pathway  with  an  irreversible  reaction  located  at  any  step  other  than 
the  first.  Pathways  with  an  irreversible  first  reaction  and  low  concentrations  of  intermediates 
(one  of  the  primary  criteria  for  functional  effectiveness)  exhibit  the  following  profile  when 
compared  to  fully  reversible  pathways:  changes  in  the  concentration  of  intermediates  in 
response  to  changes  in  the  level  of  initial  substrate  are  equally  low,  the  robustness  of  the 
intermediate  concentrations  and  of  the  flux  is  similar,  the  margins  of  stability  are  similar,  flux  is 
more  responsive  to  changes  in  demand  for  end  product,  intermediate  concentrations  are  less 
responsive  to  changes  in  demand  for  end  product,  and  transient  times  are  shorter.  These  results 
provide  a  functional  rationale  for  the  positioning  of  irreversible  reactions  at  the  beginning  of 
unbranched  biosynthetic  pathways. 
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Significance  /  Impact  /  Navy  Relevance: 

While  it  is  obvious  that  gene  circuits  are  capable  of  performing  various  computational  tasks,  it  is 
not  clear  why  particular  circuitry  has  evolved  to  perform  these  tasks.  If  we  can  develop  a  deeper 
understanding  of  the  basic  design  principles,  significant  biological  consequences  would  follow. 
Moreover,  we  might  be  able  to  supply  an  appropriate  context  for  the  directed  evolution  of  robust 
circuits  with  desirable  computational  properties,  which  could  also  have  important  technological 
ramifications. 

Several  elements  of  design,  each  exhibiting  a  variety  of  realizations,  have  been  identified  among 
elementary  gene  circuits  in  prokaryotic  organisms.  Design  principles  that  appear  to  govern  the 
realization  for  some  of  these  elements  have  been  identified  by  the  use  of  well-controlled 
mathematical  comparisons.  Work  on  this  grant  has  contributed  to  the  identification  of  four  such 
principles.  These  make  specific  predictions  regarding  (1)  two  alternative  modes  of  gene  control, 
(2)  three  patterns  of  coupling  gene  expression  in  elementary  circuits,  (3)  two  types  of  switches  in 
inducible  gene  circuits,  and  (4)  the  realizability  of  alternative  gene  circuits  and  their  response  to 
phased  environmental  cues.  In  each  case,  the  predictions  are  supported  by  experimental 
evidence.  These  results  are  important  for  understanding  the  normal  function  of  gene  circuits; 
they  also  are  potentially  important  for  developing  judicious  methods  to  redirect  normal 
expression  for  biotechnological  purposes  or  to  correct  pathological  expression  for  therapeutic 
purposes. 

In  this  third  phase  of  our  work  we  have  begun  to  address  the  task  of  using  directed  evolution  to 
design  circuits  that  perform  specific  functions.  The  statistical  methods  we  developed  have 
helped  us  identify  designs  that  meet  specific  performance  criteria  by  starting  with  random 
assignment  of  parameter  values  and  applying  selection.  The  resulting  designs  that  occur  with 
high  probability  are  likely  to  be  robust.  If  this  approach  proves  successful  for  a  wide  variety  of 
circuits  performing  diverse  functions,  then  it  would  provide  the  basis  for  experimental 
approaches  aimed  at  a  biological  realization  of  these  circuits. 

A  review  covering  all  of  the  work  undertaken  during  the  period  of  this  grant  has  recently  been 
published  [11].  This  work  has  motivated  us  to  continue  our  investigation  of  randomly-generated 
gene  circuits  and  to  focus  our  attention  on  the  role  of  alternative  forms  of  connectivity.  The  first 
paper  representing  this  new  work  in  progress  is  currently  under  revision  [12]. 
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Cells  possess  the  genes  required  for  growth  and  function  in  a  variety  of  contexts.  In  any 
given  context  there  is  a  corresponding  pattern  of  gene  expression  in  which  some  genes  are 
OFF  and  others  ON.  The  ability  of  cells  to  switch  genes  ON  and  OFF  in  a  coordinate  fashion 
to  produce  the  required  patterns  of  expression  is  the  fundamental  basis  for  complex 
processes  like  normal  development  and  pathogenesis.  The  molecular  study  of  gene 
regulation  has  revealed  a  plethora  of  mechanisms  and  circuitry  that  have  evolved  to  perform 
what  appears  to  be  the  same  switching  function.  To  some  this  implies  the  absence  of  rules. 
However,  simple  rules  capable  of  relating  molecular  design  to  the  natural  environment  have 
begun  to  emerge  through  the  analysis  of  elementary  gene  circuits.  Two  of  these  rules  are 
reviewed  in  this  paper.  These  simple  rules  have  the  ability  to  unify  understanding  across 
several  different  levels  of  biological  organization  -  molecular,  physiological, 
developmental,  ecological. 


1.  Introduction 


Regulation  of  gene  expression  and  its  systemic  manifestations  are  subjects  of 
intense  study.  As  a  result  of  this  effort  we  shall  soon  have  identified  all  of  the 
genes  and  proteins  for  a  number  of  simpler  organisms.  Despite  this  enormous 
progress  we  are  still  at  a  loss  to  understand  the  integrated  behavior  of  the  organism. 
Our  understanding  is  still  fragmentary  and  descriptive.  We  are  unable  to  predict 
changes  in  the  organism’s  behavior  when  it  is  placed  in  a  novel  environment  or 
when  a  change  is  made  in  one  of  its  genes.  Little  is  known  about  the  forces  that 
lead  to  the  selection  or  maintenance  of  a  specific  mechanism  for  the  regulation  of  a 
given  set  of  genes  in  a  particular  organism.  Is  this  process  random,  or  is  it 
governed  by  rules?  The  answer  to  this  question  is  important.  It  will  help  us  to 
understand  the  evolution  of  gene  regulation;  it  also  will  help  us  to  develop  judicious 
methods  of  redirecting  normal  expression  for  biotechnological  purposes  or  of 
correcting  pathological  expression  for  therapeutic  purposes. 

Our  goal  is  to  understand  the  integrated  structure  and  function  of 
organizationally  complex  systems  in  relation  to  their  underlying  molecular 
determinants.  Moreover,  we  are  particularly  interested  in  identifying  the  rule-like 
properties  of  these  systems  that  would  allow  for  some  algorithmic  compression  in 
their  representation,  and  not  simply  a  compilation  of  all  the  molecular  details. 

In  pursuit  of  this  goal  we  have  developed  a  canonical  nonlinear  formalism  that 
has  desirable  properties  for  the  representation  and  analysis  of  organizationally 
complex  systems  (1).  This  formalism  has  been  used  to  characterize  alternative 
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modes  of  gene  control  and  various  forms  of  coupling  among  elementary  gene 
circuits.  The  results  allow  us  to  identify  a  set  of  rules,  or  design  principles,  that 
govern  the  natural  selection  of  gene  circuits.  Here  we  shall  review  the  relevant 
biological  background  and  then  present  results  from  our  analysis  of  gene  circuitry. 


2.  Biological  Background 

The  common  metaphor  of  the  genome  as  a  blueprint  for  construction  of  the 
organism  masks  the  difficult  task  of  relating  structure  and  function  of  the  intact 
organism  to  its  underlying  genetic  determinants  (2).  The  behavior  of  an  intact 
biological  system  can  seldom  be  related  directly  to  its  underlying  molecular 
determinants.  There  are  several  different  levels  of  hierarchical  organization  that  are 
relevant.  For  our  present  purposes  it  will  be  sufficient  to  consider  four  different 
levels  —  genome  sequence,  transcriptional  unit,  elementary  gene  circuit, 
environmental  context. 

2. 1.  The  DNA  sequence  constitutes  the  genome 

The  recent  sequencing  of  the  complete  genome  for  a  number  of  simpler  organisms, 
and  the  projected  completion  of  the  sequence  for  the  human  genome  by  the  year 
2005,  illustrate  the  power  of  modern  molecular  biology  to  resolve  complex  systems 
into  their  simplest  elements.  The  four  bases  —  A,  T,  G,  and  C  --  are  strung 
together  in  sequences  that  are  mind-numbing  in  their  simplicity;  yet,  these 
sequences  provide  the  potential  for  incredible  complexity.  Whether  it  be  the 
versatile  metabolism  of  free-living  microbes  that  can  adapt  to  nearly  any 
environment,  or  the  sophisticated  structures  of  multicellular  organisms  that  can  be 
seen  in  near  endless  variety,  the  physical  basis  for  this  complexity  is  the  context- 
dependent  expression  of  the  organism’s  genome. 

2.2.  Information  is  encoded  in  transcriptional  units 

The  mapping  from  DNA  level  to  organismal  level  requires  a  deeper  understanding  of 
how  information  is  encoded  in  the  genome.  DNA  sequences  are  organized  into 
functional  units  that  consist  of  structural  genes  flanked  by  a  start  sequence  at  which 
transcription  begins  and  a  termination  sequence  at  which  it  ends.  In  addition,  there 
are  a  number  of  regulatory  sites  capable  of  binding  specific  transcription  factors  that 
interact  with  the  transcription  machinery  to  modulate  the  rate  of  transcription 
initiation  or  termination  (Fig.  1). 
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Figure  1.  Unit  of  transcription.  Structural  genes  (G\  and  G2)  are  bounded  by  a 
promoter  sequence  (P)  and  a  terminator  sequence  (T),  and  preceded  by  upstream 
modulator  sites  (Mi  and  M2)  that  bind  regulators  (Ri  and  R2)  capable  of  altering 
transcription  initiation.  The  solid  arrow  represents  the  mRNA  transcript  and  the 
scalloped  lines  indicate  the  protein  products  encoded  by  genes  G]  and  G2. 

2.3.  Expression  is  organized  into  elementary  gene  circuits 

Transcription  of  DNA  is  but  one  step  in  a  cascade  of  information  flow  that 
constitutes  the  expression  of  a  gene  (Fig.  2).  Each  stage  of  such  a  cascade  is  a 
potential  site  at  which  expression  can  be  regulated  in  a  context-dependent  fashion. 
The  context  is  provided  by  the  life  cycle  of  the  organism,  and  the  interlocking 
mechanisms  of  gene  regulation  interpret  that  context. 
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Figure  2.  Cascade  of  information  flow  from  DNA  to  RNA  to  protein  to  metabolite. 
The  processes  of  synthesis  and  degradation  are  represented  by  horizontal  arrows, 
whereas  the  catalytic  and  regulatory  influences  are  represented  by  vertical  arrows. 
An  effector  circuit  is  shown  on  right  and  a  regulator  circuit  is  shown  on  the  left. 


2.4.  Physiology  and  ecology  are  reflected  in  the  organism's  life  cycle 

The  life  cycle  of  some  organisms  is  largely  programmed  development  from  egg  to 
embryo  to  mature  adult  and  back  to  the  egg  (3).  In  other  organisms  it  is  dominated 
by  random  events  involving  a  pathogen's  ability  to  encounter  one  host,  to  exploit  or 
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colonize  that  host  for  a  period  of  time,  to  escape  into  a  secondary  environment,  and 
to  survive  there  until  an  encounter  with  a  subsequent  host  (4).  In  each  case,  specific 
genes  function  in  some  phases  of  the  organism's  life  cycle  but  not  in  others. 
Differential  patterns  of  expression  are  exhibited  as  the  context  changes  from  one 
phase  to  the  next  and  one  set  of  genes  is  switched  OFF  while  another  set  is  switched 
ON  in  a  combinatorial  fashion. 

Gene  regulation  —  the  ability  to  switch  gene  expression  ON  and  OFF  in 
appropriate  temporal  and  spatial  patterns  —  is  central  to  modern  biology.  The 
inability  to  express  a  gene  when  it  should  be  ON,  or  the  inappropriate  expression  of 
a  gene  when  it  should  be  OFF,  is  usually  dysfunctional  and  often  lethal.  The 
determination  of  what  constitutes  appropriate  expression  requires  knowledge  of  the 
molecular  mechanism,  the  physiological  function  it  realizes,  and  the  environmental 
demand  for  that  function. 

Organisms  regulate  expression  of  their  genome  by  means  of  a  diverse  repertoire 
of  molecular  mechanisms.  Most  of  the  well-characterized  examples  have  come  from 
the  study  of  prokaryotes.  Although  the  situation  is  typically  more  complex  in 
eukaryotes  and  there  are  undoubtedly  some  aspects  of  regulation  unique  to  higher 
organisms,  the  general  themes  are  much  the  same  in  both  and  most  mechanisms 
that  were  originally  thought  to  be  unique  to  eukaryotes  have  subsequently  been 
observed  within  the  prokaryotic  realm.  For  our  analysis,  we  have  abstracted  the 
generic  features  of  gene  regulation  that  are  thought  to  be  common  to  both,  but  for 
testing  our  predictions  we  have  turned  to  the  more  numerous  and  well-characterized 
prokaryotes  systems.  The  extent  to  which  the  results  might  differ  for  eukaryotes 
remains  to  be  determined. 


3,  Rules  for  the  Molecular  Mode  of  Gene  Control 

One  of  the  first  variations  in  design  to  be  well  documented  is  that  involving 
positive  vs.  negative  modes  of  gene  control  (Fig.  3).  For  example,  the  lactose  {lac) 
catabolic  system  in  Escherichia  coli  is  governed  by  a  classical  repressor  protein  (5), 
the  negative  mode  of  control.  Induction  of  gene  expression  in  this  system  is 
achieved  by  the  addition  of  an  inducer  that  removes  the  repressor  protein  to  allow 
transcription.  The  maltose  (mal)  system  in  E.  coli ,  by  contrast,  is  governed  by  an 
activator  protein  (6),  the  positive  mode  of  control.  Induction  in  this  case  is  achieved 
by  the  addition  of  an  inducer  that  converts  the  activator  protein  into  its  functional 
form  that  facilitates  transcription.  What  is  the  significance  of  this  variation  in 
design? 

This  difference  in  design  was  originally  believed  to  have  no  functional 
significance.  Subsequent  analysis  showed  that  mode  of  control  is  related  to  the  of 
control  showed  that  in  most  respects  their  behavior  can  be  identical.  However, 
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Figure  3.  Alternative  molecular  modes  of  controlling  gene  expression. 


demand  for  expression  of  the  regulated  gene  in  the  organism’s  natural  environment 
(7).  The  analysis  of  mathematical  models  with  either  the  positive  or  negative  mode 
they  behave  in  diametrically  opposed  ways  to  mutations  in  the  components  of  the 
regulatory  mechanism  itself.  Mutants  altered  in  the  positive  mechanism  are  unable 
to  express  the  corresponding  gene  product  despite  the  presence  of  inducer,  whereas 
mutants  altered  in  the  negative  mechanism  express  the  corresponding  gene  product 
even  in  the  absence  of  inducer.  The  relative  growth  of  mutant  and  wild-type 
organisms  was  examined  in  high-  and  low-demand  environments.  The  high-demand 
environment,  in  which  high-level  expression  is  frequently  required  for  the 
organism's  survival,  leads  to  selection  of  the  positive  mode  of  gene  control;  the 
low-demand  environment  leads  to  selection  of  the  negative  mode.  Thus,  molecular 
mode  of  control  is  correlated  with  level  of  demand  for  expression  of  the  regulated 
gene  product  in  the  organism’s  natural  environment  (Table  1).  These  qualitative 
predictions  are  well  supported  by  experimental  evidence  (8). 


Table  1.  Predicted  correlation  between  demand  for  expression  and  mode  of  control 


Demand  for  expression  Mode  of  regulation 


Positive  Negative 


High 

Regulation 

Regulation 

selected 

lost 

Low 

Regulation 

Regulation 
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selected 
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In  recent  analysis  we  have  examined  the  quantitative  implications  of  this 
demand  theory  (in  preparation).  First,  we  define  two  key  parameters:  the  cycle  time 
C,  which  is  the  average  time  for  a  gene  to  cycle  through  the  OFF  state,  the  ON 
state,  and  back  to  the  OFF  state;  and  demand  D,  which  is  the  fraction  of  the  cycle 
time  that  the  gene  is  ON.  Second,  a  quantitative  analysis  involving  mutation  rates 
and  growth  rates  reveals  non-overlapping  regions  in  the  C  vs.  D  space  for  which 
selection  of  wild-type  regulatory  mechanisms  with  the  negative  or  the  positive  mode 
is  realizable  (Fig.  4). 

The  quantitative  theory  specifies  more  precisely  what  we  mean  by  high  and  low 
demand.  As  can  be  seen  in  Figure  4,  with  the  nominal  values  for  the  parameters  of 
the  lactose  and  maltose  operons  in  E.  coli ,  selection  of  the  negative  mode  of  control 
requires  a  demand  less  than  0.04,  whereas  selection  of  the  positive  mode  requires  a 
demand  greater  than  0.32. 

Although  these  limits  on  demand  are  influenced  by  a  number  of  parameters,  by 
far  the  most  influential  parameter  is  the  reduction  in  growth  rate  when  there  is 
excess  expression  of  a  gene  whose  function  is  not  required.  The  nominal  value  for 


D 

Figure  4.  Thresholds  for  discriminate  selection  of  wild-type  regulatory  mechanisms 
with  negative  or  positive  modes.  There  is  maximum  value  of  demand  for  selection 
of  the  negative  mode  and  a  minimum  value  of  demand  for  selection  of  the  positive 
mode.  The  values  of  cycle  time  C  and  demand  D  are  based  on  parameter  values 
for  the  lac  and  mal  systems  in  E,  coli . 
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this  parameter  was  set  at  5%,  based  on  data  for  the  lactose  operon  that  suggest  this 
value  as  a  maximum  for  the  reduction  in  growth  rate  of  operator-constitutive 
mutants  in  a  low-demand  environment.  In  the  case  of  the  positive  mode,  the  same 
value  was  used  to  characterize  the  reduction  in  growth  rate  of  an  up-promoter  mutant 
in  a  low-demand  environment.  A  10%  variation  in  this  parameter  yields  a  two-fold 
change  in  the  limits  of  D  for  both  the  negative  and  positive  mode.  The  remaining 
parameters  have  much  less  influence  on  the  limits  of  D;  approximately  half 
exhibit  a  nearly  linear  influence,  whereas  the  other  half  have  a  negligible  influence. 


4.  Rules  for  the  Coupling  of  Elementary  Gene  Circuits 

A  second  variation  in  design  is  that  involving  the  coupling  of  elementary  gene 
circuits  for  regulator  and  effector  genes.  Early  experimental  studies  (9)  suggested 
that  expression  of  regulator  genes  is  invariant  in  some  cases  (classical  regulation), 
such  as  in  the  lac  system  in  E.  coli ,  and  coordinate  with  the  regulated  effector  genes 
in  other  cases  (autogenous  regulation),  such  as  in  the  histidine  utilization  (hut) 
system  in  Salmonella  typhimurium .  Our  earlier  work  focused  on  the  functional 
implications  of  these  alternatives,  which  we  now  refer  to  as  the  completely 
uncoupled  and  perfectly  coupled  patterns  of  regulator  and  effector  gene  expression 
(10).  However,  inducible  systems  with  other  patterns  of  gene  expression  were 
subsequently  reported,  and  these  have  become  the  stimulus  to  extend  our  earlier 
work. 

Logically,  there  are  three  qualitatively  distinct  patterns  of  regulator  and  effector 
gene  expression  that  can  be  exhibited  by  an  inducible  system  (Fig.  5).  These  are  the 
directly  coupled,  uncoupled,  and  inversely  coupled  patterns  in  which  regulator  gene 
expression  increases,  remains  the  same,  and  decreases  with  an  increase  in  effector 
gene  expression.  Well-studied  examples  of  direct  coupling,  uncoupling,  and  inverse 
coupling  are  provided  by  the  D-serine  deaminase  (11),  arabinose  (6),  and  methionine 
(12)  systems  in  E.  coli . 

The  functional  implications  of  direct  coupling,  uncoupling,  and  inverse 
coupling  have  been  determined  from  an  analysis  of  a  generalized  model  capable  of 
representing  these  different  forms  of  coupling  (Fig.  6).  The  fundamental  equations 
that  characterize  this  model  are  mass-balance  equations  that  take  the  general  form 

dXi/dt  =  V+/(X l . Xg)  -  V.i(X i,  ...  ,  Xg)  i  =  1,  ...  ,  5  (1) 

The  rate  laws  V+i  and  V.[  describe  mass  fluxes  due  to  synthetic  and  degradative 
processes.  These  rate  laws  can  be  represented  as  products  of  power-law  functions 
according  to  the  results  of  theoretical  analyses  (1)  and  empirical  case  studies  (13). 
Thus,  we  can  rewrite  Eq.  1  to  obtain  the  following  system  of  equations: 
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Log  [Substrate] 

Figure  5.  Expression  characteristics  for  (a)  effector  and  (b)  regulator  gene 
expression.  Three  distinct  patterns  of  coupling  are  illustrated.  Effector  gene 
expression  increases  while  regulator  gene  expression  (D)  increases  (directly  coupled), 
(U)  remains  unchanged  (uncoupled),  or  (I)  decreases  (inversely  coupled). 
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Figure  6.  Coupled  circuits  for  the  expression  of  regulator  and  effector  genes.  Mass 
fluxes  that  characterize  the  state  of  the  system  are  represented  by  horizontal  arrows, 
whereas  catalytic  and  regulatory  influences  are  represented  by  vertical  arrows.  The 
influences  of  the  regulator  (closed  arrowheads)  are  described  by  the  kinetic  orders 
£15  and  £45;  the  influences  of  the  inducer  (open  arrowheads)  are  described  by  the 
kinetic  orders  £13  and  £43  (see  Eqs.  2-6). 
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dX\/dt  =  V+i  -  V.\  =  a\  Xe816  X3813  X5815  -  P\Xihn  (2) 

dX2/dt  =  V+2  -  V.2  =  a2  Xj821  X\821  -  fa*!22  (3) 

dXydt  =  V+3  -  V.3  =  «3  Xgg38  X2832  -  P3  X2h32  X^33  (4) 

dXtfdt  =  V+4  -  V.4  =  0,4  Xe846  Xj843  X5^45  -  p4  X4144  (5) 

dXs/dt  =  V+5  -  V.s  =  as  Xq851  X4854  -  ps  Xs*155  (6) 


These  equations  are  used  to  analyze  systems  with  the  positive  or  negative  mode 
of  control  for  each  circuit.  The  effects  of  physicochemical  limitations,  which  arise 
from  the  subunit  structure  of  regulator  proteins  and  place  bounds  on  kinetic  orders  in 
this  model  (10),  are  also  considered.  The  functional  effectiveness  of  these  various 
circuits  has  been  compared  on  the  basis  of  several  properties  (decisiveness, 
efficiency,  selectivity,  robustness,  stability,  and  responsiveness)  that  represent 
possible  criteria  for  natural  selection.  Of  these,  responsiveness  has  proved  the  most 
sensitive  to  variations  in  circuit  design  (14). 

The  results  allow  us  to  predict  a  correlation  between  the  form  of  coupling  and 
the  capacity  for  induction  (ratio  of  maximal  to  minimal  level  of  effector  gene 
expression).  Negatively  controlled  systems  with  low,  intermediate,  and  high 
capacities  for  gene  expression  are  predicted  to  have  direct  coupling,  uncoupling,  and 
inyerse  coupling,  respectively.  Positively  controlled  systems,  in  contrast,  are 
predicted  to  have  inverse  coupling,  uncoupling,  and  direct  coupling  (Table  2). 

These  predictions  are  compared  with  data  available  in  the  literature  for  systems 
in  which  the  pattern  of  regulator  and  effector  gene  expression  is  known  (Fig.  7). 
They  are  found  to  be  in  reasonable  agreement,  given  measurement  error. 


Table  2.  Predicted  correlation  between  circuitry  and  capacity  for  regulation 
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High 

Inversely  coupled 
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Figure  7.  Patterns  of  regulator  and  effector  gene  expression  in  inducible  systems  of 
bacteria.  Expression  of  each  gene  is  measured  in  the  presence  of  excess  inducer  and 
normalized  with  respect  to  its  basal  level  in  the  absence  of  inducer.  For  the  effector 
gene  this  is  equivalent  to  its  capacity;  for  the  directly  coupled  regulator  gene  this 
also  is  equivalent  to  its  capacity,  but  for  the  inversely  coupled  regulator  gene  this  is 
equivalent  to  the  inverse  of  its  capacity.  Estimates  of  capacity  are  based  on 
published  reports.  Directly  coupled  (D),  uncoupled  (U),  and  inversely  coupled  (I) 
systems  are  represented  above,  on,  and  below  the  dashed  line,  respectively. 
Negatively  regulated  systems  are  shown  as  open  circles;  positively  regulated 
systems  are  shown  as  closed  circles. 


5.  Discussion 

The  genome  of  an  organism  evolves  to  realize  a  developmental  program  with 
specific  gene  circuitry  that  can  be  viewed  as  computing  the  solution  to  the 
environmental  problem  faced  by  the  organism.  This  is  a  suggestive  metaphor,  but 
at  present  we  have  little  understanding  of  the  circuits  and  the  computations  they 
might  perform.  The  large  number  of  genes  encoded  in  the  DNA  of  even  the 
simplest  of  organisms  suggests  that  this  circuitry  might  be  very  complex  and 
exhibit  a  high  degree  of  connectivity.  If  this  were  the  case,  then  the  task  of 
elucidating  the  circuitry  would  be  daunting. 

However,  a  number  of  different  lines  of  evidence  suggest  that  although  there 
may  be  a  large  number  of  gene  circuits,  they  may  have  a  minimal  degree  of 
connectivity.  First,  molecular  analysis  of  gene  regulation  in  bacteria  has  shown 
that  most  gene  circuits  are  governed  by  a  small  number  of  regulators,  usually  one  to 
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three.  In  eukaryotes  the  numbers  are  larger  in  some  cases,  but  seldom  more  than  a 
dozen  regulators  influence  a  given  gene  circuit.  Second,  the  enumeration  of 
regulators  and  their  targets,  based  on  sequence  homologies,  has  shown  the  same 
results  for  bacteria;  namely,  one  or  two  regulators  affecting  a  given  circuit  (15,16). 
Third,  computer  simulations  of  large,  randomly-connected  circuits  have  been  used  to 
explore  the  question  of  connectivity.  The  most  biologically-suggestive  behaviors 
were  found  when  each  circuit  was  subject  to  two  or  three  regulatory  interactions,  and 
less  relevant  behaviors  were  found  with  higher  or  lower  degrees  of  connectivity  (17). 

Low  degrees  of  connectivity  suggest  that  a  ‘bottom-up’  strategy  of 
characterizing  genome  circuitry  in  terms  of  rules  for  elemental  gene  circuits  is  likely 
to  prove  fruitful.  Indeed,  this  seems  to  be  the  case  with  our  initial  experience 
attempting  to  generalize  on  the  basis  of  the  few  rules  that  we  have  uncovered  to 
date.  To  give  one  example,  consider  the  carbon  regulation  system  in  E.  coli. 

Carbon  regulation  in  E.  coli  is  manifested  in  large  part  through  the  action  of 
the  cyclic  AMP  receptor  protein  (CRP)-cycIic  AMP  (cAMP)  system  (18),  which 
was  among  the  first  global  regulators  to  be  characterized.  This  system  coordinates 
the  utilization  of  diverse  sources  of  carbon  whose  levels  vary  in  both  time  and 
space.  An  application  of  demand  theory  indicates  that  all  of  the  regulators  in  this 
system  fit  a  self-consistent  pattern.  Because  the  CRP-cAMP  regulator  is  an 
activator  of  transcription  for  the  inducible  catabolic  systems,  one  can  predict  that  at 
least  some  of  these  systems  are  in  high  demand  in  the  organism’s  natural 
environment.  Indeed,  a  number  of  the  inducible  systems  for  non-PTS  substrates  are 
controlled  by  specific  activators  (8).  Conversely,  one  can  predict  that  the  PTS 
substrates,  which  repress  the  levels  of  CRP-cAMP,  are  seldom  present  in  high 
concentrations  in  the  natural  environment.  Indeed,  all  of  the  inducible  systems  for 
PTS  substrates  that  have  been  examined  involve  control  by  a  specific  repressor  (8), 
which  again  is  what  one  would  predict  according  to  demand  theory.  Thus,  at  least 
the  modality  of  all  the  regulators  in  this  system  seem  to  be  self-consistent. 

In  conclusion,  regulation  of  gene  expression  is  clearly  one  of  the  most 
fundamental  processes  in  the  living  world.  Knowledge  of  gene  regulation  is  a 
prerequisite  for  understanding  function,  adaptation  and  evolution,  and  such 
understanding  will  in  turn  be  essential  for  the  design  and  implementation  of  novel 
metabolic  pathways  by  means  of  genetic  engineering.  The  results  of  our  studies 
suggest  that  although  there  is  an  enormous  diversity  of  mechanisms,  there  also  are 
well-established  patterns  that  can  be  understood  in  terms  of  simple  rules. 
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ABSTRACT 

The  study  of  gene  regulation  has  shown  that  a  variety  of  molecular  mechanisms  are  capable  of  performing 
this  essential  function.  The  physiological  implications  of  these  various  designs  and  the  conditions  that 
might  favor  their  natural  selection  are  far  from  clear  in  most  instances.  Perhaps  the  most  fundamental 
alternative  is  that  involving  negative  or  positive  modes  of  control.  Induction  of  gene  expression  can  be 
accomplished  either  by  removing  a  restraining  element,  which  permits  expression  from  a  high-level 
promoter,  or  by  providing  a  stimulatory  element,  which  facilitates  expression  from  a  low-level  promoter. 

This  particular  design  feature  is  one  of  the  few  that  is  well  understood.  According  to  the  demand  theory 
of  gene  regulation,  the  negative  mode  will  be  selected  for  the  control  of  a  gene  whose  function  is  in  low 
demand  in  the  organism’s  natural  environment,  whereas  the  positive  mode  will  be  selected  for  the  con¬ 
trol  of  a  gene  whose  function  is  in  high  demand.  These  qualitative  predictions  are  well  supported  by 
experimental  evidence.  Here  we  develop  the  quantitative  implications  of  this  demand  theory.  We  define 
two  key  parameters:  the  cycle  time  C,  which  is  the  average  time  for  a  gene  to  complete  an  ON/ OFF  cycle, 
and  demand  D,  which  is  the  fraction  of  the  cycle  time  that  the  gene  is  ON.  Mathematical  analysis  involving 
mutation  rates  and  growth  rates  in  different  environments  yields  equations  that  characterize  the  extent 
and  rate  of  selection.  Further  analysis  of  these  equations  reveals  two  thresholds  in  the  C  vs.  D  plot  that 
create  a  well-defined  region  within  which  selection  of  wild-type  regulatory  mechanisms  is  realizable.  The 
theory  also  predicts  minimum  and  maximum  values  for  the  demand  D,  a  maximum  value  for  the  cycle 
time  C,  as  well  as  an  inherent  asymmetry  between  the  regions  for  selection  of  the  positive  and  negative 
modes  of  control. 


DIFFERENTIAL  regulation  of  gene  expression  is 
central  to  much  of  modern  biology.  Animal  devel¬ 
opment  can  be  thought  of  in  terms  of  an  early  phase, 
which  begins  with  an  egg  and  ends  with  an  embryo,  and 
a  late  phase,  which  begins  with  an  embryo  and  ends 
with  the  mature  organism  (Slack  1992).  Some  genes 
function  only  in  the  early  phase  while  others  only  in 
the  late  phase.  The  inability  to  express  a  gene  when  it 
should  be  ON  or  the  excess  expression  of  a  gene  when 
it  should  be  OFF  is  usually  dysfunctional  and  often  le¬ 
thal.  For  any  given  gene,  expression  can  be  considered 
a  roughly  periodic  function,  which  in  the  simplest  case 
is  OFF  for  a  period  and  ON  for  another  period  with 
the  total  duration  being  the  lifetime  of  the  organism. 
The  differential  regulation  of  many  such  genes  in  time 
and  space  determines  the  pattern  of  cell-specific  expres¬ 
sion  that  underlies  development  of  the  organism. 

The  life  cycle  of  a  bacterial  association  with  a  host  or¬ 
ganism  also  can  be  thought  of  in  terms  of  an  early  phase, 
which  begins  with  entry  into  a  host  organism  and  ends 
with  successful  colonization,  and  a  late  phase,  which 
begins  with  colonization  and  ends,  after  a  period  of 
stable  association,  with  the  entry  of  another  host  (Sal- 
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yers  1994).  Some  bacterial  genes  function  only  in  the 
early  phase  of  initial  colonization  while  others  only  in 
the  late  phase  of  stable  association.  Again,  the  inability 
to  express  a  gene  when  it  should  be  ON  or  the  excess 
expression  of  a  gene  when  it  should  be  OFF  is  dysfunc¬ 
tional  and  in  some  cases  lethal.  Expression  of  any  given 
gene  is  OFF  for  a  period  and  ON  for  another  period 
with  the  total  duration  in  this  case  being  the  time  for 
the  bacteria  to  cycle  from  one  host  to  another.  Although 
the  organisms  in  these  two  examples  are  quite  different, 
in  each  case  appropriate  differential  regulation  of  gene 
expression  is  clearly  key  to  their  survival. 

A  great  deal  is  known  about  the  molecular  details  of 
many  gene  systems,  particularly  in  well-studied  prokary¬ 
otic  organisms.  The  wealth  of  studies  in  this  area  has 
revealed  a  variety  of  designs  for  the  regulation  of  gene 
expression.  However,  we  are  just  beginning  to  under¬ 
stand  the  functional  implications  of  these  various  de¬ 
signs  and  to  grasp  the  factors  that  have  influenced  their 
evolution. 

One  of  the  first  variations  in  molecular  design  to  be 
addressed  was  negative  vs.  positive  modes  for  control¬ 
ling  gene  expression.  For  example,  the  lactose  (lac) 
operon  in  Escherichia  coli  is  an  inducible  system  with  a 
negative  mode  of  control  by  a  repressor  protein,  the 
lacl gene  product  (Miller  and  Reznikoff  1980).  In  an 
appropriate  environment,  induction  occurs  in  response 
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to  addition  of  the  specific  inducer,  which  results  in 
removal  of  repressor  and  initiation  of  transcription.  In 
contrast,  the  maltose  ( mat)  operon  is  an  inducible  sys¬ 
tem  with  a  positive  mode  of  control  by  an  activator 
protein,  the  malT  gene  product  (Schwartz  1987).  In¬ 
duction  in  this  case  involves  the  specific  inducer  binding 
to  the  activator  protein,  which  is  then  able  to  interact 
with  RNA  polymerase  and  facilitate  initiation  of  tran¬ 
scription.  The  same  physiological  function,  induction, 
is  being  realized  in  each  of  these  cases,  but  by  alternative 
molecular  mechanisms.  Are  these  alternative  designs 
historical  accidents  that  are  functionally  equivalent,  or 
have  they  been  selected  in  nature  because  they  exhibit 
functional  differences? 

An  answer  to  this  question  was  provided  by  demand 
theory  (Savageau  1974,  1977,  1983a,  1989),  which  is 
based  on  selectionist  arguments.  In  its  simplest  form, 
the  theory  can  be  understood  in  familiar  qualitative 
terms  and  leads  to  the  following  predictions:  a  negative 
mode  of  control  will  be  selected  when  there  is  a  low 
demand  for  expression  of  the  effector  genes  in  the 
organism’s  natural  environment;  a  positive  mode  will 
be  selected  when  there  is  a  high  demand  for  their  ex¬ 
pression.  These  predictions,  and  a  number  of  others 
that  follow  as  natural  extensions,  have  been  tested  in 
over  100  cases  and  there  has  been  excellent  agreement 
(Savageau  1979,  1983b,  1985). 

Here  I  develop  the  quantitative  implications  of  de¬ 
mand  theory.  Models  that  include  consideration  of  the 
organism’s  life  cycle,  molecular  mechanisms  of  gene 
control,  and  population  dynamics  are  used  to  describe 
mutant  and  wild-type  populations  in  two  environments 
with  different  demands  for  expression  of  the  genes  in 
question.  These  models  are  analyzed  mathematically  to 
identify  conditions  that  lead  to  either  selection  or  loss 
of  a  given  mode  of  control.  It  will  be  shown  that  this 
theory  ties  together  a  number  of  important  variables, 
including  growth  rates,  mutation  rates,  minimum  and 
maximum  demands  for  gene  expression,  and  minimum 
and  maximum  durations  for  the  life  cycle  of  the  organism. 
An  application  of  the  theory  is  provided  in  the  accompa¬ 
nying  article  (Savageau  1998),  where  regulation  of  the 
lac  and  mal  operons  of  E.  coli  is  analyzed  and  the  results 
are  compared  with  independent  experimental  data. 


MODELS 

Life  cycle:  We  shall  consider  a  given  effector  gene  in 
an  organism  that  cycles  between  two  alternative  environ¬ 
ments,  a  high-demand  environment  H,  and  a  low-demand 
environment  L,  as  shown  in  Figure  1 .  The  average  cycle 
time  required  for  one  complete  passage  through  both 
H  and  L  environments  is  denoted  by  C.  The  average 
fraction  of  time  spent  in  the  high-demand  environment 
is  denoted  by  D.  Note  that  D  also  signifies  demand  for 
expression  of  the  regulated  effector  gene.  If  D  =  0,  de- 
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Figure  1. — The  life  cycle  of  an  organism  alternating  be¬ 
tween  two  different  environments.  (A)  Expression  of  the  genes 
that  are  specifically  required  for  growth  in  the  environment 
labeled  H  is  in  high  demand,  whereas  in  the  alternative  envi¬ 
ronment  labeled  L  their  expression  is  in  low  demand.  (B) 
The  average  time  required  for  the  organism  to  complete  its 
life  cycle  is  denoted  by  C.  The  fraction  of  its  cycle  time  spent 
in  environment  H  is  denoted  by  D,  which  also  represents 
demand  for  expression  of  the  H-specific  genes. 

mand  is  minimal  because  the  organism  is  always  in  the 
low-demand  environment;  if  D  =  1,  demand  is  maximal 
because  the  organism  is  always  in  the  high-demand  envi¬ 
ronment. 

Gene  expression:  The  models  of  gene  expression  and 
mutation  that  will  be  treated  are  shown  schematically 
in  Figures  2  and  3.  The  effector  genes  in  each  case 
are  normally  expressed  in  environment  H  but  not  in 
environment  L.  To  simplify  the  diagrams  and  the  discus¬ 
sion,  we  shall  consider  mutations  in  the  regulatory 
mechanism  to  be  an  alteration  in  the  modulator  site. 
Mutations  in  the  structural  gene  for  the  regulator  pro¬ 
tein  also  can  disrupt  the  normal  interaction  between 
the  regulatory  protein  and  the  modulator  site  to  which 
it  binds,  and  these  will  be  suitably  accounted  for  even 
though  they  will  not  be  represented  diagrammatically 
or  discussed  in  detail.  Other  types  of  mutations  will  be 
considered  briefly  in  the  discussion  section. 

In  the  negative  mode  of  control  (Figure  2) ,  environ¬ 
ment  H  involves  expression  of  the  effector  gene  in  the 
wild-type  organism.  It  also  involves  expression  in  the 
mutants  with  a  defect  in  the  modulator  site  to  which 
the  negative  regulator  binds.  Normal  expression  is  pre¬ 
vented  in  the  mutants  with  a  defect  in  the  promoter 
site.  Environment  L  involves  the  absence  of  expression 
of  the  effector  gene  in  the  wild-type  organism  and  in 
the  mutants  with  a  defect  in  the  promoter  site.  There 
is  inappropriate  expression  in  the  mutant  with  a  defect 
in  the  modulator  site.  The  mutation  rates  between  the 
different  populations  are  as  indicated. 

In  the  positive  mode  of  control  (Figure  3),  environ- 
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Figure  2. — Expression  of  genes  governed  by  the  negative 
mode  of  control  in  the  high-demand  (H)  and  low-demand 
(L)  environments.  The  symbols  are  as  follows:  structural  gene 
for  the  regulator  protein,  R;  structural  gene  for  the  effector 
protein,  E\  nucleotide  sequence  for  the  promoter  site,  P;  and 
nucleotide  sequence  for  the  modulator  site,  M.  The  wild-type 
promoter  in  the  negative  mode  must  be  a  high-level  promoter 
to  achieve  full  expression  upon  removal  of  repressor,  and  a 
functional  modulator  site  (operator)  is  necessary  for  expres¬ 
sion  to  be  turned  off  in  the  presence  of  repressor.  The  heavy 
arrows  indicate  transcription  of  the  effector  gene.  The  four 
diagrams  in  A  and  B  represent  the  genotypes  of  the  wild-type 
(w) ,  promoter  mutant  (p) ,  modulator  mutant  (m) ,  and  double 
mutant  (d).  The  mutation  rates  between  the  populations  of 
organisms  that  harbor  each  of  these  genotypes  are  as  indicated 
with  the  appropriate  subscripts  and  superscripts;  e.g.,  7r$m  rep¬ 
resents  the  mutation  rate  in  the  high-demand  environment 
for  production  of  double  mutants  (d)  from  modulator  mu¬ 
tants  (m). 
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Figure  3. — Expression  of  genes  governed  by  the  positive 
mode  of  control  in  the  high-demand  (H)  and  low-demand 
(L)  environments.  The  symbols  are  as  follows:  structural  gene 
for  the  regulator  protein,  R,  structural  gene  for  the  effector 
protein,  E;  nucleotide  sequence  for  the  promoter  site,  P;  and 
nucleotide  sequence  for  the  modulator  site,  M.  The  wild-type 
promoter  in  the  positive  mode  must  be  a  low-level  promoter 
for  expression  to  be  turned  off  upon  removal  of  activator, 
and  a  functional  modulator  site  (initiator)  is  necessary  to 
achieve  full  expression  in  the  presence  of  activator.  The  heavy 
arrows  indicate  transcription  of  the  effector  gene.  The  four 
diagrams  in  A  and  B  represent  the  genotypes  of  the  wild-type 
(w) ,  promoter  mutant  (p) ,  modulator  mutant  (m) ,  and  double 
mutant  (d).  The  mutation  rates  between  the  populations  of 
organisms  that  harbor  each  of  these  genotypes  are  as  indi¬ 
cated  with  the  appropriate  subscripts  and  superscripts;  e.g., 
raj,w  represents  the  mutation  rate  in  the  low-demand  environ¬ 
ment  for  production  of  promoter  mutants  (p)  from  wild-type 
organisms  (w). 


ment  H  involves  expression  of  the  effector  gene  in  the 
wild-type  organism.  It  also  involves  expression  in  the 
mutants  with  a  mutationally  enhanced  promoter  site. 
Normal  expression  is  prevented  in  the  mutants  with  a 
defect  in  the  modulator  site.  Environment  L  involves 
the  absence  of  expression  of  the  effector  gene  in  the 
wild-type  organism  and  in  the  mutants  with  a  defect  in 
the  modulator  site.  There  is  inappropriate  expression 
in  the  mutants  with  a  mutationally  enhanced  promoter 
site.  The  mutation  rates  between  the  different  popula¬ 


tions  are  as  indicated,  but  it  should  be  noted  that  the 
values  for  these  parameters  need  not  be  the  same  for 
the  two  modes  of  control. 

Populations:  All  of  the  relevant  populations  and  con¬ 
ditions  can  be  represented  in  a  common  abstract  dia¬ 
gram  in  which  the  growth  rates  of  the  individual  popula¬ 
tions  and  the  mutation  rates  between  populations  are 
explicitly  depicted  (Figure  4).  There  will  be  four  sets 
of  parameter  values  associated  with  this  diagram,  one 
each  for  the  negative  mode  in  high  demand,  the  nega- 
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Figure  4. — Schematic  diagram  representing  the  popula¬ 
tions  of  wild-type  and  mutant  organisms.  The  symbols  are 
as  follows:  number  of  wild-type  organisms,  Xw;  number  of 
promoter  mutants,  Xp;  number  of  modulator  mutants,  XM;  and 
number  of  double  mutants,  XD.  The  growth  rates  of  each 
population  are  indicated  by  the  symbol  g  with  the  relevant 
subscripts,  and  the  mutation  rates  between  populations  are 
indicated  by  m  with  the  appropriate  subscripts.  See  text  for 
further  discussion. 


tive  mode  in  low  demand,  the  positive  mode  in  high 
demand,  and  the  positive  mode  in  low  demand. 

Assumptions:  These  models  are  based  on  a  number 
of  assumptions.  First,  the  organisms  harboring  these 
gene  systems  are  assumed  to  be  otherwise  isogenic.  Sec¬ 
ond,  because  we  are  interested  in  the  conditions  for 
selection  of  the  wild-type  regulatory  mechanism,  we 
shall  assume  that  the  ratio  of  wild-type  to  mutant  organ¬ 
isms  is  initially  1/10  its  steady-state  value  and  then  exam¬ 
ine  the  conditions  that  lead  to  enrichment  of  the  wild 
type.  Third,  sites  in  the  DNA  consist  of  a  number  of 
critical  bases,  and  mutation  in  any  one  of  these  leads 
to  a  loss  of  function  in  the  modulator  sites.  The  same 
is  true  of  the  high-level  promoter  in  the  negative  mode. 
The  low-level  promoter  in  the  positive  mode  consists  of 
a  smaller  number  of  critical  bases,  and  mutation  in  any 
of  these  leads  to  a  mutationally  enhanced  promoter 
level.  Fourth,  the  regulator  gene  consists  of  a  number 
of  critical  bases,  and  mutation  in  any  one  of  these  leads 
to  a  loss  of  the  regulator  function.  Fifth,  we  will  be 
concerned  only  with  the  forward  mutational  events  as 
indicated  in  Figures  2-4.  The  back  mutational  events 
can  be  neglected  because  the  mutant  populations  will 
be  small,  according  to  our  criterion  for  selection,  and 
the  probability  of  back  mutation  is  lower  than  that  in 
the  forward  direction.  Sixth,  although  our  models  will 
account  for  the  dynamics  of  the  doubly  mutant  popu¬ 
lation,  we  will  neglect  this  aspect  because  the  singly 
mutant  populations  will  be  small  and  the  probability  of 
a  second  mutation  will  make  the  production  rate  of  the 
doubly  mutant  population  that  much  smaller.  Finally, 


we  shall  assume  that  expression  is  fully  ON  or  fully  OFF 
and  that  both  the  positive  and  negative  modes  of  control 
have  the  same  capacity  for  gene  regulation  (Savageau 
1989),  which  we  take  to  be  100  for  the  ratio  of  full  ex¬ 
pression  to  basal  expression. 

PARAMETERS 

The  macroscopic  parameters  in  our  theory  can  be 
decomposed  into  constituent  parameters  that  are  de¬ 
fined  in  terms  of  reference  values  and  relative  values 
for  mutation  rates  and  growth  rates. 

Mutation  rates:  The  reference  mutation  rate  p  is  given 
by  the  spontaneous  mutation  rate  per  base  per  DNA 
replication.  The  spontaneous  mutation  rate  for  various 
structures  in  our  model  can  be  determined  from  esti¬ 
mates  of  the  spontaneous  mutation  rate  per  base  and 
the  relative  mutation  rate  given  by  the  number  of  critical 
bases  that  define  the  DNA  targets  for  these  structures. 
We  will  consider  the  following  relative  mutation  rates 
in  our  model:  tt  for  loss  of  a  high-level  promoter  site, 
v  for  gain  of  a  high-level  promoter  site,  t  for  loss  of  a 
regulator’s  functional  target  site,  and  p  for  loss  of  a 
functional  regulator  protein.  We  can  also  define  a  rela¬ 
tive  mutation  rate  £  and  explore  the  effects  of  gene 
expression  on  mutation  rate  (Datta  and  Jinks- Robert¬ 
son  1995;  Francino  et  al  1996). 

Growth  rates:  The  reference  growth  rate  yis  defined 
as  the  growth  rate  of  the  wild-type  organism  in  the  nutri¬ 
tionally  richer  of  the  two  environments.  Its  value  is  not 
critical  because  one  can  simply  rescale  time  accordingly 
and  none  of  our  results  would  change.  The  growth  rates 
in  other  circumstances  can  be  expressed  as  the  product 
of  the  reference  growth  rate  and  the  appropriate  relative 
growth  rate.  We  will  consider  the  following  relative 
growth  rates  in  our  model:  \  for  mutants  that  have  lost 
normal  expression  of  the  effector  gene,  a  for  mutants 
that  exhibit  superfluous  expression  of  the  effector  gene, 
and  6  for  the  more  nutritionally  deficient  of  the  two 
environments. 

Criterion  for  selection:  Our  criterion  for  selection  is 
that  each  mutant  population  shall  be  reduced  to  no 
more  than  0  of  the  wild-type  population.  A  typical  value 
for  0  is  0.05%  (Leclerc  et  al  1996). 

These  relationships  are  summarized  in  Table  1.  Nu¬ 
merical  estimates  for  these  parameters  are  given  in  the 
accompanying  article  (Savageau  1998),  which  provides 
a  specific  application  of  the  theory. 

QUANTITATIVE  DEVELOPMENT  OF  THE  THEORY 

The  mathematical  analysis  needed  for  this  develop¬ 
ment  can  be  significantly  reduced  by  taking  advantage 
of  two  fundamental  symmetries  in  our  model.  First,  there 
is  a  symmetry  between  the  promoter-mutant  and  modu¬ 
lator-mutant  populations  that  is  evident  in  Figure  4.  If 
the  subscripts  p  and  m  are  simply  interchanged  the 
model  remains  unchanged.  This  means  that  we  need 
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Decomposition  of  macroscopic  parameters  into  constituent  parameters 


Parameter" 

Mode  of  control 

Negative 

Positive 

High  demand 

Low  demand 

High  demand 

Low  demand 
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78 

78 

7 
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78 
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7 

gd 
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78 
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™dp 
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See  Figures  2-4  for  definition  of  parameters. 

n  The  parameters  for  growth  rates  and  mutation  rates  in  turn  determine  the  parameters  for  the  rate  constants 
in  the  dynamic  Equations  1—4:  awv  =  [1  —  (ffipw  +  ^nw) ] gw>  otpw  =  aPP  —  0  “  ^p)gp»  —  anmi  ~ 

(1  ”  W^dm)gm>  Otdm  =  ^mgtm  adp  —  ^dp^pj  add  —  gd- 


only  carry  out  the  analysis  for  the  promoter-mutant  pop¬ 
ulation;  the  corresponding  results  for  the  modulator- 
mutant  population  can  then  be  obtained  simply  by  inter¬ 
changing  the  subscripts  p  and  m.  Second,  there  is  a 
symmetry  between  the  first  and  second  phases  of  the 
cycle  depicted  in  Figure  1.  If  the  H  and  L  phases  are 
interchanged  along  with  the  symbols  Dand  (1  —  D)  the 
temporal  pattern  remains  unchanged.  This  means  that 
we  need  only  carry  out  the  analysis  from  the  beginning 
of  the  H  phase;  the  corresponding  results  from  the 
beginning  of  the  L  phase  can  then  be  obtained  by  inter¬ 
changing  the  superscripts  H  and  L  and  the  symbols  D 
and  (1  —  D). 

Dynamics:  The  equations  describing  the  dynamic  be¬ 


havior  of  the  model  in  Figure  4  are 

dX./dt=  (1) 

dXp/dt=  <vX  +  <*pp*p  (2) 

dX^/dt  =  a  mw  K  +  otmm  ^  (3) 

dXi/  dt  —  Otdm  Xn  “dp  Xj,  4"  “dd  Xl  >  (  4  ) 


where  the  numbers  for  each  population  as  a  function 
of  time  are  given  by  the  symbol  X  with  appropriate 
subscripts  and  the  first-order  rate  constants  are  given 
by  the  symbol  a,  again  with  appropriate  subscripts.  The 
rate  constants  are  in  turn  related  to  the  various  mutation 
rates  and  growth  rates,  represented  by  the  symbols  m 
and  g  with  suitable  subscripts:  a,„  =  [1  -  ( mv,  + 

w*nn,)]gw>  “pw  =  WJp Kgv,  Otpp  =  (1  —  nklp)gp>  “mw  —  Wn.wg'w, 
Unuii  —  (1  —  )gm’  “dm  =  “dp  —  “dd  —  $d- 

Equations  1-4  are  linear  and  easily  solved  to  obtain 
numbers  for  the  wild-type  and  mutant  populations  as  a 
function  of  time.  The  numbers  for  the  wild-type  and 
promoter-mutant  populations  at  the  end  of  a  full  period 
in  environment  H  are  given  in  terms  of  the  initial  values 
at  an  arbitrary  time  t: 


Xv(f  +  DQ  =  X,(t)  exp[aEL.Z)Cl  (5) 

Xp (t  +  DQ  =  [a“/(a“,  -  ] Xv( 0  exp[alDC] 

+  (Ap(t)  -  [a“„/(otl  -  otpP)]X,(01 
X  exp[appZ)C].  (6) 

These  numbers  then  become  the  initial  values  for  the 
solution  in  environment  L,  and  the  numbers  at  the  end 
of  the  period  in  environment  L  are  then 

X,(l  +  Q  =  X,{t)  exp[a"(DC]  exp[aL(l  -  D)Q  (7) 

X,,{1  +  Q  =  X.(<)  l[apw/(aL  -  otpp) ]  exp[o&DC] 

X  {exp[a^,(l  -  D)C]  -  exp[a^p(l  -  D)Q\ 

+  [otpw/fal  -  otpp) J  exp[apP(l  -  D)Q 
X  |exp[a^Z)C]  -  expfappDC])) 

+  Ap (/)  exp[otppZ)C]  exp[apP(l  -  D)Q.  (8) 

Thus,  the  temporal  behavior  is  determined  by  four  ex¬ 
ponential  functions  with  time  constants  that  are  inde¬ 
pendent  of  C. 

The  ratio  of  the  promoter-mutant  to  the  wild-type 
numbers,  which  is  plotted  in  Figure  5,  yields 

A p(<  +  C)/X,(/  +  Q  =  ([a^/(aL  -  <*pP)] 

X  11  -  exp[(oipp  -  aL)(l  -  D)C\\ 

+  t“pw/  (ot^,.  -  Otpp)  ] 

X  11  —  exp[(oipp  - 
X  exp[(a^  -  ai„)  (1  -  D)Q\ 

+  {exp[(ctpp  -  a?,,)  DC] 

X  exp[(o<pp  -  aL)(l  —  Z>)C]1 
X  Ap «)/X,«) 
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Figure  5. — Recursive  relationship  for  the  ratio  of  popula¬ 
tion  sizes  for  promoter-mutant  and  wild-type  organisms.  The 
horizontal  axis  gives  the  value  of  the  ratio  at  an  arbitrary  time 
t;  the  vertical  axis  gives  the  value  at  the  subsequent  time  t  + 
C,  which  is  one  complete  cycle  later.  Selection  for  the  wild- 
type  organism  is  indicated  when  the  recursive  relationship, 
which  is  the  straight  line  given  by  Equation  9,  has  a  slope 
between  0  and  1  and  an  intercept  between  0  and  0.0005.  The 
intersection  of  this  line  with  the  45°  line  determines  a  value 
for  the  ratio  that  represents  a  stable  steady  state. 


or 


Ap  (t  +  Q/X(f  +  Q  =  (intercept) 

+  (slope)  Ap (0/X(0- 

(9) 

Note  that  the  intercept  and  slope  in  this  expression  are 
both  positive  quantities.  A  slope  greater  than  1  implies 
that  the  ratio  tends  to  infinity  with  time  and  thus  that 
the  wild-type  promoter  is  lost.  A  slope  between  0  and  1 
implies  that  the  ratio  tends  to  a  fixed  value  (given  by 
the  intersection  with  the  45°  line)  with  time  and,  if  this 
value  is  less  than  0  (the  criterion  for  selection),  that 
the  wild-type  promoter  will  be  preserved.  An  intercept 
greater  than  0  implies  loss  of  the  wild-type  promoter 
no  matter  what  the  value  of  the  slope. 

Starting  with  any  set  of  values  for  the  wild-type  and 
promoter-mutant  populations,  Equations  7-9  can  be 
applied  recursively  to  calculate  the  subsequent  popula¬ 
tion  sizes  and  ratios  as  a  function  of  time.  From  these 
results  one  can  determine  the  rate  of  selection  of  the 
wild-type  regulatory  mechanism. 

Steady-state  pattern:  The  ratio  of  promoter-mutant 
and  wild-type  populations  increases  in  one  environment 
and  decreases  in  the  other  to  produce  a  sawtooth  pat¬ 
tern.  Once  the  initial  transients  have  died  away,  a  re¬ 
peating  pattern  with  two  steady-state  values  is  estab¬ 
lished.  The  first  value  of  the  ratio  in  steady  state,  when 
it  exists,  is  calculated  by  equating  the  ratios  on  the  two 


sides  of  Equation  9  and  solving  to  obtain  the  following 
expression: 

Ap/X  =  {[a^/(aL,  -  otpP)] 

X  (1  -  exp[(otpP  -  c4j(l  -  D)C\) 

+  [apW/(a“.  -  a”)] 

X  {I  -  exp[(app  -  aSL)DC]}  exp[(a^p  -  aj^) 

X  (1  -  D)Q\/\  1  -  exp[(otpP  -  a”w)Z)C 

+  Kp-aL)(l  -D)Q).  (10) 

If,  instead  of  starting  the  analysis  at  the  beginning  of 
the  period  in  environment  H,  we  were  to  start  it  at  the 
beginning  of  the  period  in  environment  L,  then  the 
results  would  be  equivalent  to  those  in  Equations  5-10 
except  for  an  exchange  of  the  superscripts  H  and  L  and 
the  symbols  D  and  (1  -  D).  The  second  value  of  the 
ratio  in  steady  state,  when  it  exists,  is  thus 

Xp/Xw  =  {[a”./ (ot”,  -  otpP)](l  -  exp[ (ofpp  -  al)DC]} 

4-  [a[»./(aL  “  otpp)  ]  ( 1  -  exp[  (otpp  -  aL) 

X  (1  -  D)Q) 

X  exp[(otpP  -  o&)DC])/{l  “  exp  [(otpp  -  <0 
X  (1  -D)C+  (a»  -  c&)Z>q|  .  (11) 

Equations  10  and  11  represent  different  aspects  of  the 
same  steady-state  pattern.  One  of  the  two  steady-state 
solutions  for  this  ratio  gives  the  maximum  value  whereas 
the  other  gives  the  minimum  value.  These  values  can 
be  used  to  define  the  extent  of  selection.  We  shall  always 
be  interested  in  the  maximum  value  of  the  ratio;  if 
this  is  less  than  the  criterion  for  selection,  then  the 
minimum  value  will  certainly  be  less  as  well. 

Definition  of  the  threshold  for  selection:  The  thresh¬ 
old  for  selection  of  the  wild-type  promoter  is  obtained 
from  the  solution  of  Equation  10  or  11,  whichever  gives 
the  maximum  value  for  the  ratio.  The  values  for  the 
growth  rates  and  mutation  rates  in  the  high-  and  low- 
demand  environments  (for  either  the  positive  or  the 
negative  mode  of  control  in  Table  1)  determine  the 
values  for  the  rate-constant  parameters  that  appear  in 
Equations  10  and  11.  The  ratio  Ap/A*  is  then  fixed  with 
a  value  equal  to  0,  which  is  the  criterion  for  selection. 
The  result  of  these  parameter  assignments  is  a  nonlinear 
equation  involving  the  cycle  time  Cand  the  demand  for 
gene  expression  D  that  defines  the  threshold  for  selection . 
There  is  no  explicit  solution  for  C  as  a  function  of  D. 
However,  the  threshold  for  selection  of  the  wild-type 
promoter  can  be  obtained  by  bisection  (Press  et  aL 
1988)  when  numerical  values  are  assumed  for  the  pa¬ 
rameters  in  Equation  10  or  11. 

As  noted  at  the  beginning  of  this  section,  the  corre¬ 
sponding  results  for  the  modulator-mutant  population 
can  be  obtained  from  Equations  5-1 1  simply  by  inter- 
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changing  the  subscripts  p  and  m.  We  will  make  use  of 
these  expressions  below. 

Although  there  is  no  analytical  solution  that  gives  the 
thresholds  for  selection,  their  asymptotic  behavior  can 
be  determined  analytically.  As  will  be  seen  in  the  follow¬ 
ing  sections,  the  analytical  expressions  allow  one  to  draw 
general  conclusions  that  are  independent  of  particular 
numerical  values  for  the  parameters. 

Threshold  for  selection  of  a  promoter  with  the  nega¬ 
tive  mode:  The  ratio  of  promoter-mutant  and  wild-type 
populations  is  decreasing  in  environment  H  and  in¬ 
creasing  in  environment  L.  Thus,  the  maximum  value 
in  steady  state  is  determined  from  the  analysis  that  starts 
in  H.  The  asymptotic  character  of  the  threshold  for 
selection  of  the  promoter  can  be  determined  from 
Equation  10.  First,  it  should  be  noted  from  Table  1  that 
(otpp  -  otD  >  0  and  a|,w/(otL-  -  c£p)  =  -1.  Second,  for 
typical  values  of  the  parameters,  (a"  -  a"„)  <  0. 

When  C  1,  and  D  >  (otpP  -  (^/[(otpp  -  a^)  - 
(otpp  -  a**)],  Equation  10  can  be  approximated  as 

6  =  exp[(otpP  -  <0(1  -  D)Q  -  1 

+  [a“,/(otSL  -  app)]  exp[(a|)p  -  aL)  ( 1  -  D)C\  , 

(12) 

where  0  is  the  criterion  for  selection  of  the  promoter. 
Solving  for  C  as  a  function  of  D  yields 

_  log[l  +  9]  ~  log[l  +  aiSL-/(o&  ~  Q]  1 

(Otpp  “  OtJnr)  1  -  D 

(13) 

The  arguments  of  the  logarithms  are  nearly  unity,  so 
that 

_  9  Up„/  (ot^-w  otpp) _ 1  (14) 

(otpp  -  <0  1  —  D 

or 

_  9  -  |xire/[l  -  X(1  —  m-t  —  (j-p)  -  |x£(ir  +  t  +  p)] 
jxTryS 

X  — - — 

1  -  D 


Log  ID] 

Figure  6. — Schematic  representation  of  the  thresholds  for 
selection  of  the  wild-type  regulatory  mechanism  as  functions 
of  the  cycle  time  and  the  demand  for  gene  expression.  The 
threshold  for  selection  against  the  promoter  mutants  is  ob¬ 
tained  for  a  given  set  of  parameter  values  by  setting  the  ratio 
of  Ap/Xs  =  0.0005  in  Equation  1 0  or  1 1  and  then  solving  for  the 
cycle  time  Cas  a  function  of  the  demand  for  gene  expression  D. 
The  threshold  for  selection  against  the  modulator  mutants  is 
obtained  in  a  similar  fashion  (see  text  for  discussion).  In  each 
case,  selection  is  indicated  by  values  for  C  and  D  that  lie  below 
the  calculated  threshold.  Selection  for  the  wild-type  regulatory 
mechanism  occurs  for  those  values  of  C  and  D  that  lie  below 
both  threshold  simultaneously.  These  thresholds  define  mini¬ 
mum  and  maximum  values  for  demand. 


D  _ _ _ («»  -  <*L)(i  +  8) _ 

(«PP  -  aL)(l  +  9)  -  0$,  -  (app  -  a”J0 

(16) 

or 

Anin  -  JXirSd  +  0)/{|X7t8(1  +  0) 

+  [1  -  Ml  ~  PT  -  *ip) 

—  |x£(tt  +  t  +  p)]0-  jjnre) 

«  p/ir8/0(l  -  X).  (17) 


_ 1 

jj/rryS  l  ~  D  (15) 

Thus,  the  high-C  asymptote  in  a  log  C  vs.  log  D  plot  is 
given  by  a  line  that  is  nearly  horizontal  for  values  of 
D  <  1  and  that  approaches  infinity  as  D  goes  to  unity. 

When  C<  1,  the  exponential  functions  in  Equation 
10  can  be  approximated  by  the  first  three  terms  of  their 
Taylor  series  and  the  resulting  equation  can  be  solved 
for  C  as  a  function  of  D .  The  value  of  D  =  Z)min  that 
makes  C  =  0  is  given  by 


Thus,  the  low-C  asymptote  is  given  by  a  vertical  line 
located  at  D  =  Dmm  in  a  log  C  vs.  log  D  plot. 

The  threshold  for  selection  of  the  promoter  is  charac¬ 
terized  by  the  combination  of  these  high-  and  low-C 
asymptotes  as  shown  schematically  in  Figure  6. 

Threshold  for  selection  of  a  modulator  (regulator) 
with  the  negative  mode:  The  ratio  of  modulator-mutant 
and  wild-type  populations  is  decreasing  in  environment 
L  and  increasing  in  environment  H.  Thus,  the  maxi¬ 
mum  value  in  steady  state  is  determined  from  the  analy¬ 
sis  that  starts  in  L.  The  asymptotic  character  of  the 
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threshold  for  selection  of  the  modulator  (regulator) 
can  be  determined  from  Equation  11  after  interchang¬ 
ing  the  subscripts  p  and  m.  In  this  case,  (aL  ~  aL  > 
0,  aL/(all  -  aL)  =  “1  and,  for  typical  values  of  the 
parameters,  (aL  -  aL  <  0. 

When  C>  1,  and  D  <  (aL  -  aL/[(aL  -  aL  “ 
(a»m  “  aLL  Equation  11  can  be  approximated  as 

6  =  exp[(aL  -  aL^Q  -  1 

+  [o iL/(aL’  ~  aL)]  exp[(aJJm  -  o&)DQ,  (18) 

where  0  is  the  criterion  for  selection  of  the  modulator. 
Solving  for  C  as  a  function  of  D  yields 

c  =  log[l  +  0]  -  log[l  +  aL/(aL  ~  aL)]  1 
(Otmm  “  OtJJv)  D 

(19) 

The  arguments  of  the  logarithms  are  nearly  unity,  so 
that 


C  = 


0 _ Omw/  (a^w _ ^mm)  1 

(aL  ~  aL  D 


(20) 


or 


c  _  6  -  M>(t  +  p)/[l  -  cf(1  ~  | ±ttz)  -  |x(-ir  +  t  +  p)]  J_ 
|x(t  +  p)ey  D 


0 


1 


|x(t  +  p)E7  D 


(21) 


Thus,  the  high-C  asymptote  is  given  by  a  straight  line 
with  slope  equal  to  -1  in  a  log  C  vs.  log  D  plot. 

When  C  <  1 ,  the  exponential  functions  in  the  steady- 
state  ratio  can  be  approximated  by  the  first  three  terms 
of  their  Taylor  series  and  the  resulting  equation  can  be 
solved  for  C  as  a  function  of  D.  The  value  of 
D  -  Anax  that  makes  C  =  0  is  given  by 

n  _ _ [  — 6(aL  -  aL  ~  aL] _ 

[  — 0(aL  -  O  “  aL]  +  (aL  “  aL (1  +  0) 


(22) 


or 

Anax  =  8(0 [1  -  <t(1  -  p/rre)  ~  p(tt  +  t  +  p)] 

-  jx(t  +  p))/8{0[l  -  a(l  -  p/ire) 

-  jx(tt  +  t  +  p)] 

“  pt(T  +  p)} 

+  jx(t  +  p)£(l  +  0) 

-1/(1  +  p(T  +  p)e/[80(l  -  a)]}.  (23) 

Thus,  the  low-C  asymptote  is  given  by  a  vertical  line 
located  at  D  =  A™  in  a  log  C  vs.  log  D  plot. 

The  threshold  for  selection  of  the  modulator  (regu¬ 
lator)  is  characterized  by  the  combination  of  these 
high-  and  low-C  asymptotes  as  shown  schematically  in 
Figure  6. 

Region  in  which  selection  for  the  negative  mode  of 


control  is  realizable:  Selection  for  both  wild-type  pro¬ 
moter  and  wild-type  modulator  (regulator)  requires  val¬ 
ues  of  C  and  D  that  lie  in  the  shaded  region  below  the 
two  thresholds  shown  schematically  in  Figure  6.  The 
low-C  asymptotes  of  these  thresholds  (Equations  17  and 
23)  define  the  minimum  Dmin  and  maximum  Anax  values 
of  the  demand  for  gene  expression.  The  intersection 
of  the  two  thresholds  yields  a  prediction  for  maximum 
cycle  time  CL-  As  shown  elsewhere,  with  numerical 
estimates  for  the  various  parameters,  the  theory  predicts 
other  more  relevant  values  not  only  for  maximum  cycle 
time,  but  also  for  minimum  cycle  time  and  optimal  cycle 
time  (Savageau  1998).  Thus,  the  thresholds  define  a 
region  of  the  C  ias.  D  plot  within  which  selection  for  the 
wild-type  regulatory  mechanism  is  realizable  and  outside 
of  which  it  is  not. 

Existence  of  a  region  of  realizable  selection  for  the 
negative  mode:  Clearly,  Anax  >  An.n  is  required  for  a 
region  of  realizable  selection  to  exist.  These  boundaries 
for  selection  are  strongly  influenced  by  the  selection 
coefficients  (1  —  X  and  1  —  a),  which  are  related  to 
the  differences  in  growth  rates  for  wild-type  and  mutant 
organisms.  This  is  seen  most  clearly  for  the  simplified 
case  in  which  all  relative  mutation  rates  are  equal  to 
unity  and  all  mutants  have  the  same  reduction  in  growth 
rate.  The  inequality  involving  Equations  1 7  and  23  yields 
a  critical  value  for  the  selection  coefficients;  selection 
of  the  wild-type  regulatory  mechanism  is  possible  only 
when  the  selection  coefficients  exceed  this  critical  value: 


(1  -  X)  =  (1  -  ct) 


>  M-(l  +  S) 
20 


1  + 


4(1  ~  5) 
(1  +  8)2 


(24) 


This  can  be  seen  graphically  in  Figure  7  where  the 
thresholds  for  selection  are  plotted  for  different  values 
of  the  selection  coefficients. 

Discriminate  selection  for  the  negative  mode  of  con¬ 
trol:  When  the  reduction  in  growth  rate  for  the  mutants 
is  sufficiently  small  (<^0.0005%  in  this  illustration) 
there  is  no  overlap  beneath  the  thresholds.  No  selection 
for  the  wild-type  regulatory  mechanism  is  possible  when 
the  selection  pressure  is  too  weak.  When  the  reduction 
in  growth  rate  has  an  intermediate  value  (between 
0.0005  and  0.01%  in  this  illustration)  there  is  a  signifi¬ 
cant  and  well-delineated  overlap  beneath  the  thresh¬ 
olds.  Discriminate  selection  for  the  wild-type  regulatory 
mechanism  occurs  within  a  range  of  relatively  low  values 
for  demand,  but  not  outside  it.  When  the  reduction 
in  growth  rate  is  sufficiently  large  (>M).01%  in  this 
illustration)  the  overlap  is  so  large  that  it  encompasses 
almost  the  entire  range  of  values  for  demand.  Indiscrim¬ 
inate  selection  for  the  wild-type  regulatory  mechanism 
occurs  under  these  conditions. 

Threshold  for  selection  of  a  promoter  with  the  posi¬ 
tive  mode:  The  ratio  of  promoter-mutant  and  wild-type 
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Solving  for  C  as  a  function  of  1  —  D  yields 

r  _  log[l  +  6]  -  1or[1  +  <C/(otL  ~  «},p)] 
c  faH  -  otH  ) 


X 


1  -  (1  -  D) 


(26) 


or 


0  -  ^v/[l  ~  cr(l  ~  p,(T  +  p)e)  ~  p,(v  +  t  +  p)] 
tiueyS 


1  -  (1  -  D) 

e _ l 

1  —  (1  —  D) 


(27) 


Thus,  the  high-C  asymptote  in  a  log  C  vs.  log(l  —  D) 
plot  is  given  by  a  line  that  is  nearly  horizontal  for  values 
of  (1  -  D)  <  1  and  that  approaches  infinity  as  (1  —  D) 
goes  to  unity. 

When  C<  1,  the  exponential  functions  in  Equation 
11  can  be  approximated  by  the  first  three  terms  of  their 
Taylor  series  and  the  resulting  equation  can  be  solved 
for  C  as  a  function  of  1  —  D.  The  value  of  1  —  D  =  1  — 
Z)max  that  makes  C  ~  0  is  given  by 

= _ (ag>  ~  0(1  +  6) _ 

(otpp  “  <0(1  +  0)  -  QtpW  -  (otpp  -  (1^)0 


Figure  7. — Discriminate  selection  for  wild-type  regulatory 
mechanisms  with  alternative  modes  of  control  requires  inter¬ 
mediate  values  for  the  selection  coefficients.  Results  (A-F) 
are  shown  for  the  negative  mode  in  a  simplified  case  (see 
text  for  discussion).  When  selection  coefficients  are  too  low 
(<0.0005%) ,  there  is  no  selection  for  the  wild  type.  At  interme¬ 
diate  values  (0.0005-0.01%),  discriminate  selection  for  the 
wild  type  occurs  at  relatively  low  values  of  demand.  When  sel¬ 
ection  coefficients  are  too  high  (>0.01%),  selection  for  the 
wild-type  regulatory  mechanism  occurs  indiscriminately  at 
nearly  all  values  of  demand.  The  results  for  the  positive  mode 
are  similar,  except  that  discriminate  selection  occurs  at  rela¬ 
tively  high  values  of  demand. 


populations  is  decreasing  in  environment  L  and  increas¬ 
ing  in  environment  H.  Thus,  the  maximum  value  in 
steady  state  is  determined  from  the  analysis  that  starts 
in  L.  The  asymptotic  character  of  the  threshold  for 
selection  of  the  promoter  can  be  determined  from 
Equation  11.  In  this  case,  it  can  be  seen  from  Table  1 
that  (ctpp  -  a”.)  >  0,  a”/(c&  -  a”)  =  -1  and,  for 
typical  values  of  the  parameters,  («[,,,  -  <  0. 

When  C  >  1  and  (1  -  D)  >  (a“  -  ajl)/[(a”  - 
a!JL)  “  (otpp  -  O  ] ,  Equation  11  can  be  approximated 
as 

0  =  exp  [  (app  -  a^)Z)C]  -  1 

+  K/CaU  -  otpp)]  exp [(apP  -  a*,)DQ.  (25) 


(28) 

or 

1  -  Ana*  =  |xue8(l  +  0)/{pve5(l  +  0) 

+  [1  -  <r(l  “  h(t  +  P)*0 
-  JJL (x?  +  T  +  p)]0  -  |XU) 

-  fjLU£6/0(l  -  <j )  .  (29) 

Thus,  the  low-C  asymptote  is  given  by  a  vertical  line 
located  at  1  -  D  ~  1  -  Aia*  in  a  log  C  vs.  log(l  -  D) 
plot. 

The  threshold  for  selection  of  the  promoter  in  this 
case  is  characterized  by  high-  and  low-C  asymptotes  that 
are  similar  to  those  for  the  negative  mode  shown  sche¬ 
matically  in  Figure  6,  except  that  the  horizontal  axis  is 
given  by  log(l  -  D)  rather  than  log  D  (data  not  shown). 

Threshold  for  selection  of  a  modulator  (regulator) 
with  the  positive  mode:  The  ratio  of  modulator-mutant 
and  wild-type  populations  is  decreasing  in  environment 
H  and  increasing  in  environment  L.  Thus,  the  maxi¬ 
mum  value  in  steady  state  is  determined  from  the  analy¬ 
sis  that  starts  in  H.  The  asymptotic  character  of  this 
threshold  can  be  determined  from  Equation  10  after 
interchanging  the  subscripts  p  and  m.  In  this  case, 
(<*mm  -  c*L)  >  0,  aL/(ot L  -  olL)  =  “1  and,  for  typical 
values  of  the  parameters,  (a”m  - 

When  C>  1,  and  (!-/))<  (a£m  -  aS,)/[(aSm  ~ 
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aJJv)  —  (aj^m  —  ot)],  Equation  10  can  be  approximated 
as 

0  =  exp[(aL  -  aL)(l  -  D)Q  -  1 

+  [aL/(a“.  -  a“m)]  exp[(aL  -  aL)(l  ~  D)C\  . 

(30) 

Solving  for  C  as  a  function  of  1  ~  D  yields 

c_  log[l  +  0]  -  log[l  +  aLAgg,  -  qp]  1 

(«L  -  otL)  1  -  D 

(31) 


or 

Q  =  9  ~  ^(T  +  p)e/[l  -  Ml  -  M  -  |JLe(u  +  T  +  p)] 

g(T  +  p)y 


x 


1 

1  -  D 


(32) 


e  i 

p,(T  4-  p)y  1  —  D 

Thus,  the  high-C  asymptote  is  given  by  a  straight  line 
with  slope  equal  to  —1  in  a  log  C  vs.  log(l  —  D)  plot. 

When  C  <  1,  the  exponential  functions  in  the  steady- 
state  ratio  can  be  approximated  by  the  first  three  terms 
of  their  Taylor  series  and  the  resulting  equation  can  be 
solved  for  C  as  a  function  of  1  —  D.  The  value  of  1  — 
D  —  1  —  Dmin  that  makes  C  ~  0  is  given  by 

j  _  D  _ _ -a|,'u<  -  (al,  -  otgje _ 

-aJJw  -  (ffi  -  a”)  6  +  (aL>  ~  «L)(1  +  6) 

(33) 

or 


5{6[1  -  X(1  —  fxv)  -  fxe(v  +  t  +  p)]  —  ji(t  +  p)e} 
8{0[1  —  A(1  —  fxu)  —  |xe(u+T+p)]  —  |Jl(t+p)£}+|x(t+p)(1  +  0) 


1  +  jx(t  +  p)/ [80 ( 1  -  X)]  ’  V  ; 

Thus,  the  low-C  asymptote  is  given  by  a  vertical  line 
located  at  1  —  D  =  1  —  Dmax  in  a  log  C  vs.  log(l  —  D) 
plot. 

The  threshold  for  selection  of  the  modulator  (regula¬ 
tor)  in  this  case  is  characterized  by  high-  and  low-C 
asymptotes  that  are  similar  to  those  for  the  negative 
mode  shown  schematically  in  Figure  6,  except  that  the 
horizontal  axis  is  given  by  log(l  —  D)  rather  than  log 
D  (data  not  shown). 

Discriminate  selection  for  the  positive  mode  of  con¬ 
trol:  The  results  for  the  positive  mode  of  control  are 
completely  symmetrical  to  those  obtained  for  the  nega¬ 
tive  mode  of  control  under  the  simplifying  conditions 
in  Figure  7;  one  need  only  replace  D  by  (1  —  D).  When 
the  percentage  reduction  in  growth  rate  for  the  mutants 
is  small,  no  selection  for  the  wild-type  regulatory  mecha¬ 
nism  is  possible.  At  intermediate  percentages,  discrimi¬ 


nate  selection  for  the  positive  mode  of  control  occurs 
within  a  well-delineated  range  of  relatively  high  values 
for  demand,  but  not  outside  this  range.  At  large  percent¬ 
ages,  selection  occurs  indiscriminately  at  nearly  all  val¬ 
ues  for  demand,  and,  given  the  above  results  for  the 
negative  mode,  one  would  expect  positive  and  negative 
modes  of  control  to  arise  at  random  with  nearly  equal 
probability.  Such  indiscriminate  selection  is  inconsistent 
with  the  experimental  evidence,  which  suggests  discrimi¬ 
nate  selection  of  negative  and  positive  modes  of  control 
based  on  demand  for  gene  expression  (Savageau  1989). 

Asymmetric  regions  in  which  selection  for  the  alterna¬ 
tive  modes  is  realizable:  The  simplified  case  examined 
in  Figure  7  suggests  completely  symmetric  regions  in 
which  selection  for  the  alternative  modes  occurs.  Alter¬ 
natively,  the  region  for  the  positive  mode  with  1  —  D 
as  the  horizontal  axis  is  identical  to  that  for  the  negative 
mode  with  D  as  the  horizontal  axis.  This  implies  that 
the  value  of  Z)max  (Equation  23)  for  the  negative  mode 
is  equal  to  the  value  of  1  -  Dmln  (Equation  34)  for 
the  positive  mode.  This  would  be  true  if  the  following 
conditions  were  satisfied:  0N  =  0P,  jxN  =  |xP,  tn  =  tp,  pN  = 
pP,  £N  =  £p  -  1,  crN  =  \P,  uN  =  Vp.  While  it  is  reasonable 
to  assume  that  the  first  four  conditions  are  satisfied 
(criterion  for  selection  0,  mutation  rate  p,  size  of  the 
modulator  target  t,  and  size  of  the  regulator  p  are  the 
same  for  both  the  negative  N  and  positive  P  mode),  it 
is  very  unlikely  that  the  last  three  would  ever  be  satisfied. 
There  is  evidence  that  gene  expression  has  an  influence 
on  mutation  rate  (e  &  1),  that  the  reduction  in  growth 
rate  due  to  superfluous  gene  expression  is  less  than  that 
due  to  the  loss  of  normal  gene  expression  (crN  <  XP) ,  and 
that  down-promoter  mutations  in  the  negative  mode 
are  more  frequent  than  up-promoter  mutations  in  the 
positive  mode  (ttn  >  uP).  From  these  considerations  we 
can  predict  asymmetric  regions  in  which  selection  for 
the  alternative  modes  is  realizable.  Furthermore,  be¬ 
cause  loss  of  normal  expression  typically  causes  a  more 
significant  reduction  in  growth  rate  than  superfluous 
expression,  we  can  predict  that  the  realizable  region  for 
selection  of  the  positive  mode  is  greater  than  that  for 
the  negative  mode. 

Time  course  of  selection:  If  we  start  with  each  mutant 
ratio  (Ap/X  and  X^/X^)  at  some  value  larger  than  its 
steady-state  value,  then  these  mutant  ratios  will  mono- 
tonically  decrease  with  time,  as  can  be  seen  from  Figure 
5.  Alternatively,  the  wild-type  regulatory  mechanism  is 
enriched  with  time,  since  the  ratio  of  wild-type  to  mutant 
organisms  XJ  (An  +  Xp)  is  equal  to  the  reciprocal  of 
the  mutant  fraction ,  which  we  define  a sfm.  The  temporal 
behavior  of  the  populations  is  a  function  of  the  demand 
for  gene  expression  D.  However,  the  behavior  is  inde¬ 
pendent  of  the  cycle  time  C  in  the  following  sense.  The 
time  scale  is  actually  discrete,  given  by  values  of  nC , 
where  n  is  the  number  of  cycles.  Thus,  within  a  fixed 
time  period,  the  same  degree  of  enrichment  can  be 
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achieved  with  either  a  large  value  for  C  and  a  small 
number  n  or  a  small  value  for  C  and  a  larger  number  n. 

Extent  of  selection:  While  there  is  selection  for  the 
wild-type  regulatory  mechanism  throughout  the  region 
of  overlap  beneath  the  thresholds  ( e.g .,  Figure  6),  the 
extent  of  the  selection  varies  as  a  function  of  cycle  time 
C  and  demand  D.  We  define  the  extent  of  selection  as  the 
steady-state  value  of  X/(Xn  +  XJ>),  which  is  the  inverse 
of  the  mutant  fraction  in  the  population  (l//m).  For  a 
given  value  of  C<  one  mutant  population  increases 
as  the  corresponding  threshold  is  approached;  it  domi¬ 
nates  the  mutant  fraction  and  the  extent  of  selection 
reaches  its  minimum  (1/0).  Similarly,  the  second  mu¬ 
tant  population  increases  as  the  other  threshold  is  ap¬ 
proached;  it  dominates  the  mutant  fraction  and  the 
extent  of  selection  again  reaches  its  minimum.  Thus, 
the  extent  of  selection  reaches  its  maximum  at  a  value 
of  D  that  is  intermediate  between  its  threshold  values. 

Rate  of  selection:  Equations  7-9  can  be  applied  recur¬ 
sively  to  calculate  population  sizes  and  ratios  as  a  func¬ 
tion  of  time.  The  rate  at  which  selection  occurs  is  inde¬ 
pendent  of  cycle  time,  as  noted  above.  We  define  response 
time  as  the  time  required  for  the  ratio  X/(^m  +  ^>)  to 
reach  99%  of  its  steady-state  value  starting  from  an  initial 
state  in  which  the  numbers  of  the  two  types  of  mutants 
are  equal  and  the  ratio  is  equal  to  1/10  of  its  steady- 
state  value.  Recall  that  the  time  points  are  given  in  units 
of  rcC,  where  C  is  the  cycle  time  and  n  is  the  number 
of  cycles.  The  same  temporal  behavior  is  obtained  re¬ 
gardless  of  whether  C  is  large  (n  small)  or  small  (n 
large) .  However,  the  resolution  is  poorer  for  large  values 
of  C  because  the  minimum  value  of  n  is  1.  There  is  no 
analytical  expression  for  response  time,  but  it  is  readily 
determined  by  numerical  means  in  specific  cases,  as  can 
be  seen  in  the  following  application  (Savageau  1998). 


DISCUSSION 

Demand  theory  of  gene  regulation  predicts  that  the 
molecular  mode  of  control  is  correlated  with  the  de¬ 
mand  for  gene  expression  in  the  organism’s  natural 
environment  (Savageau  1989).  The  quantitative  devel¬ 
opment  presented  in  this  article  not  only  confirms  and 
quantifies  the  previous  qualitative  predictions,  but  it  also 
identifies  critical  factors  and  reveals  new  relationships. 

The  recursive  equations  that  characterize  the  popula¬ 
tion  dynamics  of  mutant  and  wild-type  organisms  (Equa¬ 
tions  7-9)  allow  one  to  predict  the  time  course  for 
selection.  The  form  of  these  equations  also  allows  one 
to  predict  that  the  response  time  for  selection  is  inde¬ 
pendent  of  the  cycle  time  C,  whereas  it  is  strongly  depen¬ 
dent  upon  the  demand  for  gene  expression  D.  The 
steady-state  soludon  of  the  recursive  equations  provides 
estimates  for  the  extent  of  selection  (Equations  10  and 
1 1).  A  threshold  for  selection  is  determined  by  the  rela¬ 
tionship  between  cycle  time  Cand  demand  D  that  results 


when  the  extent  of  selection  is  set  equal  to  the  criterion 
for  selection. 

The  thresholds  for  selection  in  the  C  vs.  D  plot  define 
regions  within  which  selection  of  the  positive  or  negative 
mode  of  regulation  is  realizable  (Figure  6) .  Their  inter¬ 
section  defines  a  maximum  value  for  the  cycle  time 
C^ax,  and  their  asymptotes  define  minimum  Z)min  and 
maximum  Dniax  values  of  the  demand  for  gene  expres¬ 
sion.  These  regions  also  exhibit  an  inherent  asymmetry 
that  favors  selection  of  the  positive  mode. 

As  can  be  seen  from  the  asymptotic  expressions  for 
Dmm  and  Dmax  (Equations  17,  23,  29,  and  34),  the  ratio 
of  mutation  rate  to  selection  coefficient  is  the  most 
relevant  determinant  of  the  allowed  region  for  selec¬ 
tion.  Indeed,  if  the  target  sizes  for  the  various  types  of 
mutations  and  the  selection  coefficients  are  increased 
by  the  same  order  of  magnitude,  then  the  results  are 
essentially  unchanged. 

These  predictions,  and  others  that  are  made  pos¬ 
sible  by  the  assignment  of  specific  values  for  the  parame¬ 
ters,  are  examined  further  in  the  accompanying  article 
(Savageau  1998),  where  we  apply  this  theory  to  the 
regulation  of  the  lactose  and  maltose  operons  of  Esche¬ 
richia  coli. 

The  quantitative  version  of  demand  theory  presented 
in  this  study  provides  a  framework  for  further  develop¬ 
ment.  Other  types  of  mutations  can  be  incorporated 
in  a  relatively  straightforward  manner.  Mutations  that 
result  in  a  phenotype  similar  to  that  of  an  existing  muta¬ 
tion  can  be  included  by  simply  adding  their  target  size, 
as  was  done  here  for  mutations  in  the  regulator  gene 
and  in  the  modulator  site  to  which  the  regulator  binds 
(t  -f  p) .  Mutations  in  the  structural  gene  for  the  effector 
protein  could  be  included  by  adding  the  appropriate 
target  size  to  the  target  size  of  the  promoter  (rr),  in  the 
case  of  the  negative  mode,  or  the  modulator/ regulator 
(t  +  p),  in  the  case  of  the  positive  mode.  Similarly,  in 
this  study  we  have  emphasized  the  predominant  types 
of  mutations  that  disrupt  normal  function.  Those  that 
might  augment  normal  function  can  be  considered  by 
again  adding  their  target  size  to  the  target  size  of  an¬ 
other  mutation  that  results  in  a  similar  phenotype.  For 
example,  a  mutation  in  an  operator  site  might  result  in 
tighter  binding  of  the  cognate  repressor  and  failure  to 
allow  induction  of  gene  expression  in  the  high-demand 
environment.  Such  a  mutant  would  exhibit  the  same 
phenotype  as  the  promoter  mutants  we  have  considered. 
The  target  size  for  mutations  that  augment  binding, 
which  is  presumably  smaller  than  the  target  size  for  muta¬ 
tions  that  disrupt  the  normal  operator,  can  be  added  to 
the  target  size  for  mutations  in  the  promoter  (it)  . 

Mutants  that  result  in  phenotypes  different  from 
those  considered  here  also  can  be  added  in  a  straightfor¬ 
ward  manner.  In  these  cases,  one  first  calculates  the 
individual  threshold  for  each  class  of  mutation;  this  may 
involve  entirely  different  sets  of  parameters  and  not  just 
a  different  target  size  for  mutation.  Then  one  adds  these 
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thresholds  to  obtain  the  region  of  allowable  selection 
for  the  wild-type  regulatory  system.  For  the  cases  de¬ 
scribed  in  the  previous  paragraph,  this  method  and  the 
method  of  simply  adding  the  appropriate  target  sizes 
produce  the  same  results  (data  not  shown). 

In  summary,  the  quantitative  development  of  demand 
theory  reveals  unexpected  relationships  between  the 
demand  for  gene  expression  D  and  the  average  ON/ 
OFF  cycle  time  for  the  gene  C,  which  is  a  manifestation 
of  the  organism’s  life  cycle.  The  theory  provides  equa¬ 
tions  for  the  rate  and  extent  of  selection,  and  these 
reveal  well-defined  regions  of  the  C  vs.  D  plot  within 
which  selection  is  realizable.  The  realizable  regions  for 
the  positive  and  negative  mode  exhibit  an  inherent 
asymmetry  with  characteristic  values  for  Dmin,  Dmax,  and 
Cm*.  The  demand  theory  of  gene  regulation  can  be 
extended  within  the  framework  presented  here  to  in¬ 
clude  organisms  with  life  cycles  that  are  more  complex 
than  the  two  phases  illustrated  in  this  article  and  regula¬ 
tory  systems  that  are  more  complex  than  a  single  mecha¬ 
nism  of  gene  control. 
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ABSTRACT 

Induction  of  gene  expression  can  be  accomplished  either  by  removing  a  restraining  element  (negative 
mode  of  control)  or  by  providing  a  stimulatory  element  (positive  mode  of  control).  According  to  the 
demand  theory  of  gene  regulation,  which  was  first  presented  in  qualitative  form  in  the  1970s,  the  negative 
mode  will  be  selected  for  the  control  of  a  gene  whose  function  is  in  low  demand  in  the  organism’s  natural 
environment,  whereas  the  positive  mode  will  be  selected  for  the  control  of  a  gene  whose  function  is  in 
high  demand.  This  theory  has  now  been  further  developed  in  a  quantitative  form  that  reveals  the  importance 
of  two  key  parameters:  cycle  time  C,  which  is  the  average  time  for  a  gene  to  complete  an  ON/OFF  cycle, 
and  demand  D,  which  is  the  fraction  of  the  cycle  time  that  the  gene  is  ON.  Here  we  estimate  nominal 
values  for  the  relevant  mutation  rates  and  growth  rates  and  apply  the  quantitative  demand  theory  to  the 
lactose  and  maltose  operons  of  Escherichia  coll  The  results  define  regions  of  the  C  vs.  D  plot  within  which 
selection  for  the  wild-type  regulatory  mechanisms  is  realizable,  and  these  in  turn  provide  the  first  estimates 
for  the  minimum  and  maximum  values  of  demand  that  are  required  for  selection  of  the  positive  and 
negative  modes  of  gene  control  found  in  these  systems.  The  ratio  of  mutation  rate  to  selection  coefficient 
is  the  most  relevant  determinant  of  the  realizable  region  for  selection,  and  the  most  influential  parameter 
is  the  selection  coefficient  that  reflects  the  reduction  in  growth  rate  when  there  is  superfluous  expression 
of  a  gene.  The  quantitative  theory  predicts  the  rate  and  extent  of  selection  for  each  mode  of  control.  It 
also  predicts  three  critical  values  for  the  cycle  time.  The  predicted  maximum  value  for  the  cycle  time  C 
is  consistent  with  the  lifetime  of  the  host.  The  predicted  minimum  value  for  C  is  consistent  with  the  time 
for  transit  through  the  intestinal  tract  without  colonization.  Finally,  the  theory  predicts  an  optimum  value 
of  C  that  is  in  agreement  with  the  observed  frequency  for  E.  coli  colonizing  the  human  intestinal  tract. 

mined  by  the  number  of  critical  bases  in  its  nucleotide 
sequence  and  the  mutation  rate  per  base  per  round 
of  DNA  replication.  A  mutant  altered  in  one  of  these 
components  may  exhibit  two  different  phenotypes  de¬ 
pending  upon  the  phase  of  the  life  cycle  in  which  it  is 
expressed.  The  growth  rate  of  the  organism  serves  as 
the  relevant  phenotype,  and  selection  is  based  upon 
differences  in  growth  rate  among  wild-type  and  mutant 
organisms. 

The  quantitative  development  of  demand  theory 
(Savageau  1998)  combines  these  elements  of  life  cycle, 
ecology,  physiology,  and  molecular  genetics  to  predict 
regions  of  the  C  vs.  D  plot  within  which  selection  for 
the  wild-type  regulatory  mechanisms  is  realizable.  These 
regions  define  minimum  and  maximum  values  for  de¬ 
mand.  This  theory  ties  together  a  number  of  important 
variables,  including  growth  rates,  mutation  rates,  and 
minimum  and  maximum  demands  for  gene  expression. 
We  apply  demand  theory  here  to  the  lactose  (lac)  and 
maltose  (mal)  catabolic  systems  of  Escherichia  coli ,  and 
we  show  that  this  theory  also  yields  predictions  for  the 
rate  and  extent  of  selection  and  for  minimum,  maxi¬ 
mum,  and  optimal  cycle  times  of  E.  coli  that  are  in 
reasonable  agreement  with  independent  experimental 
data. 

Life  cycle  of  E .  coli:  The  normal  life  cycle  of  an  organ- 
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THE  life  cycle  of  a  microbe,  in  the  simplest  case, 
consists  of  alternative  phases.  The  demand  for  ex¬ 
pression  of  some  effector  genes  will  be  high  in  one 
phase  and  low  in  the  other,  and  adapting  the  level  of 
expression  to  this  varying  demand  requires  a  functional 
regulatory  mechanism.  It  has  long  been  known  that  the 
same  regulatory  function,  for  example,  induction  of 
gene  expression,  can  be  accomplished  in  one  of  two 
different  modes:  the  negative  mode  involves  the  re¬ 
moval  of  a  restraining  element,  which  permits  expres¬ 
sion  from  a  high-level  promoter,  whereas  the  positive 
mode  involves  the  provision  of  a  stimulatory  element, 
which  facilitates  expression  from  a  low-level  promoter. 
The  demand  theory  of  gene  regulation  provides  a  selec¬ 
tionist  explanation  for  this  fundamental  duality  ( e.g ., 
see  Savageau  1977,  1989). 

The  components  of  a  minimal  regulatory  mechanism 
consist  of  a  promoter  site,  a  modulator  site,  and  a  regula¬ 
tor  gene  encoding  the  protein  that  binds  the  modulator 
site  in  response  to  environmental  cues.  Each  of  these 
components  is  subject  to  a  rate  of  mutation  that  is  deter- 
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Figure  1. — The  life  cycle  of  Escherichia  coli  alternates  be¬ 
tween  two  different  environments,  the  proximal  portions  of 
the  digestive  tract  (A) ,  where  lactose  levels  are  relatively  high 
and  maltose  levels  relatively  low,  and  the  distal  portions  (B), 
where  lactose  levels  are  relatively  low  and  maltose  levels  rela¬ 
tively  high.  In  the  environment  labeled  A,  there  is  a  high 
demand  for  expression  of  the  A-specific  lac  genes  and  a  low 
demand  for  expression  of  the  B-specific  mal  genes,  whereas 
in  the  alternative  environment  labeled  B  the  demand  for  these 
same  genes  is  reversed.  As  described  in  the  text,  the  average 
time  required  for  the  organism  to  complete  its  life  cycle  is 
denoted  by  C,  which  also  represents  the  average  time  for  the 
A-specific  (or  B-specific)  genes  to  complete  their  ON/OFF 
cycle.  The  fraction  of  this  cycle  time  spent  in  environment  A 
(or  B),  which  also  represents  the  demand  for  expression  of 
the  A-specific  (or  B-specific)  genes,  is  denoted  by  D. 

ism  defines  the  demand  for  expression  of  its  effector 
genes.  The  life  cycle  of  E.  coli  as  it  passes  from  one 
host  to  another  will  be  considered  here  in  terms  of 
two  different  environments  (Figure  1).  The  first  will  be 
identified  with  the  proximal  portions  of  the  digestive 
tract  for  a  lactose-tolerant  host  that  ingests  both  lactose 
and  starch  (which  consists  largely  of  maltose).  This  is 
the  environment  in  which  rapid  growth  occurs  during 
the  transition  between  stable  association  with  one  host 
and  then  another.  The  second  environment  will  be  iden¬ 
tified  with  the  distal  end  of  the  small  intestine  and  the 
colon  of  the  host  in  which  colonization  and  slow  growth 
take  place.  Also  included  in  the  second  environment 
will  be  the  host’s  surroundings  through  which  the  bacte¬ 
ria  pass  to  enter  a  subsequent  host.  This  is  admittedly 
a  simplification  of  a  more  complex  ecology  (Cooke 
1974;  Freter  1976;  Savageau  1983),  but  nevertheless 
it  captures  the  essential  features  for  our  purposes  here. 
Additional  environments  and  more  complex  linkages 
among  them  in  principle  can  be  handled  by  the  same 
methods. 

Ecology  and  gene  expression:  We  shall  consider  the 
lac  operon  as  representative  of  a  low-demand  function 
governed  by  the  negative  mode  of  control  (Miller  and 
Reznikoff  1980)  and  the  mal  operon  as  representative 


of  a  high-demand  function  governed  by  the  positive 
mode  of  control  (Schwartz  1987).  These  systems  are 
well  studied  at  the  molecular  level,  and  the  evidence 
regarding  their  mode  of  control  is  clear. 

Evidence  regarding  demand  for  expression  of  the  lac 
and  mal  operons  of  E.  coli  comes  from  studies  of  intes¬ 
tinal  ecology.  Lactose  is  a  relatively  rare  sugar  in  nature 
(Shallenberger  1974).  The  host’s  lactase  enzymes, 
which  hydrolyze  this  disaccharide  and  thereby  permit 
its  utilization  by  the  host,  are  located  in  the  proximal 
small  intestine  and  are  subject  to  developmental  regula¬ 
tion  (Dahlqvist  1961;  Koldovsky  and  Chytil  1965; 
Walker  1968).  In  contrast,  maltose,  the  breakdown 
product  of  all  dietary  starch,  is  among  the  most  abun¬ 
dant  sugars  (Widdas  1971).  The  host’s maltase  enzymes, 
which  hydrolyze  this  disaccharide  and  thereby  permit 
its  utilization  by  the  host,  are  located  at  the  distal  end 
of  the  small  intestine  and  in  the  colon  (Dahlqvist 
1961;  Rosensweig  and  Herman  1968).  This  informa¬ 
tion  suggests  that  the  lac  operon  of  E .  coli  is  likely  to  be 
expressed  at  high  levels  in  the  first  environment  and  at 
low  levels  in  the  second,  whereas  the  mal  operon  is  likely 
to  be  expressed  at  high  levels  in  the  second  environment 
and  at  low  levels  in  the  first. 

The  time  required  for  E.  coli  to  pass  through  the  high- 
demand  environment  for  lactose  utilization  is  about 
3  hr.  This  is  one-half  the  average  time  required  to  reach 
the  colon  (Madsen  1992);  the  3-hr  figure  is  also  based 
on  measured  patterns  of  lactose  utilization  (Bond  and 
Levitt  1976;  Malagelada  et  al  1984).  Much  of  the 
ingested  lactose  is  hydrolyzed  to  constituent  sugars  that 
are  absorbed  by  the  host  in  the  proximal  small  intestine; 
the  remainder  is  catabolized  by  the  bacteria  so  that  very 
little  lactose  normally  reaches  the  colon  (Bond  and 
Levitt  1976).  From  this  3-hr  figure  for  time  in  the  high- 
demand  environment,  one  can  predict  that  the  cycle 
time  of  E .  coli  will  be  inversely  related  to  the  demand 
for  expression  of  its  lactose  operon  C  =  S/D. 

We  estimate  the  time  for  passage  through  the  low- 
demand  environment  for  maltose  utilization  to  be  ^6 
hr.  This  is  the  average  time  required  for  a  bolus  of 
ingested  food  to  reach  the  distal  portions  of  the  small 
intestine  and  colon  (Madsen  1992).  We  assume  that 
free  maltose  is  sparse  in  the  proximal  portions  of  the 
small  intestine  and  that  it  becomes  abundant  only  in 
the  distal  portion  of  the  intestinal  tract  where  the  host’s 
maltase  enzymes  are  localized  (Dahlqvist  1961; 
Rosensweig  and  Herman  1968).  From  this  6-hr  figure 
for  time  in  the  low-demand  environment,  one  can  pre¬ 
dict  that  the  cycle  time  of  E.  coli  will  be  inversely  related 
to  1  minus  the  demand  for  expression  of  its  maltose 
operon  C  =  6/(1  -  D). 

ESTIMATION  OF  PARAMETER  VALUES 

The  demand  theory  of  gene  regulation  involves  three 
levels  of  parameters  (Savageau  1998):  constituent  pa- 


TABLE  1 


Definitions  and  nominal  values  for  the  constituent  parameters  that  determine  the  growth  rates  and 
mutation  rates  for  organisms  with  positive  and  negative  modes  of  control 
in  high-  and  low-demand  environments 


Nominal  value" 

Symbol 

Definition 

Negative  control 

Positive  control 

|X 

Reference  mutation  rate 

6E-10 

6E-10 

base-1  generation"1 

base  1  generation  1 

77 

Mutation  rate,  relative  to  p,  for  loss  of  a 

10 

— 

promotor  with  negative  control 

1 

V 

Mutation  rate,  relative  to  p,  for  gain  of  an 

— 

up-promoter  with  positive  control 

20 

20 

7 

Mutation  rate,  relative  to  p,  for  loss  of  a 

regulator’s  functional  target  site 

60 

60 

P 

Mutation  rate,  relative  to  p,  for  loss  of  a 

functional  regulator  protein 

1 

1 

8 

Mutation  rate,  relative  to  p,  when 

expression  is  increased  100-fold 

1.0  generation 

1.0  generation 

7 

Reference  growth  rate  in  the  nutritionally 

richer  of  the  two  environments 

hour-1 

hour-1 

8 

Growth  rate,  relative  to  y,  when  in  the  more 

0.0125 

0.0125 

nutritionally  deficient  environment 

0.97 

\ 

Growth  rate,  relative  to  y,  when  there  is  a 

— 

loss  of  expression  with  negative  control 

0.97 

\ 

Growth  rate,  relative  to  8y,  when  there  is 

— 

loss  of  expression  with  positive  control 

0.999 

cr 

Growth  rate,  relative  to  8y,  when  there  is 

— 

superfluous  expression  with  negative  control 

0.999 

<T 

Growth  rate,  relative  to  y,  when  there  is 

— 

superfluous  expression  with  positive  control 

*  See  text  for  estimation  of  parameter  values. 


rameters,  individual  growth  rates  and  mutation  rates, 
and  macroscopic  rate  constants.  Estimates  for  the  values 
of  the  constituent  parameters  in  our  model  are  given 
below.  In  a  subsequent  section  we  will  determine  the 
consequences  of  these  choices  by  examining  other  val¬ 
ues  for  each  of  the  parameters.  As  we  shall  see,  the  exact 
values  are  not  critical  for  about  half  of  the  parameters, 
whereas  the  values  for  two  of  them  are  extremely  influ¬ 
ential. 

Reference  mutation  rate,  fx:  For  E.  coli,  the  spontane¬ 
ous  mutation  rate  is  estimated  to  have  a  nominal  value 
of  p0  =  6E-10  per  base  per  DNA  replication  (Drake 
1991).  The  spontaneous  mutation  rate  for  loss  of  func¬ 
tion  in  the  modulator  or  promoter  sites  of  our  model 
can  be  determined  from  estimates  of  the  spontaneous 
mutation  rate  per  base  and  the  number  of  critical  bases 
that  define  these  sites. 

Relative  mutation  rate  for  loss  of  a  high-level  pro¬ 
moter  site,  tt:  Promoters  encompass  a  region  of  ^75 
nucleotides  upstream  of  the  RNA  start  site  (Harley  and 
Reynolds  1987;  Lisser  and  Margalit  1993) .  Although 
most  of  the  information  in  promoter  sequences  is  local¬ 
ized  within  two  6-base  blocks  with  variable  spacing  be¬ 
tween  them  (the  “consensus”  elements  at  positions  -10 
and  -35  relative  to  the  start  site),  there  is  only  limited 


base  conservation  at  most  positions.  If  we  assume  that 
each  of  10  nucleotides  is  critical  for  the  definition  of  a 
high-level  promoter  (Schneider  et  al  1986),  then  the 
relative  mutation  rate  is  tt0  =  10. 

Relative  mutation  rate  for  gain  of  a  high-level  pro¬ 
moter  site,  v:  Spontaneous  up-promoter  mutations  with 
the  positive  mode  occur  at  about  one-tenth  the  fre¬ 
quency  of  spontaneous  down-promoter  mutations  with 
the  negative  mode  (G.  Gussin,  personal  communica¬ 
tion).  If  we  assume  that  a  low-level  promoter  can  be 
converted  to  a  high-level  promoter  by  a  single  mutation 
in  a  critical  base,  then  the  relative  mutation  rate  for  this 
gain  of  a  high-level  promoter  site  is  v0  =  1. 

Relative  mutation  rate  for  loss  of  a  regulator’s  func¬ 
tional  target  site,  t:  Targets  for  the  binding  of  regulator 
proteins  are  the  modulator  sites — operator  sites  in  the 
case  of  the  negative  mode  and  initiator  sites  in  the  case 
of  the  positive  mode.  Operator  sites  span  a  region  of 
^100  nucleotides  upstream  of  the  RNA  start  site 
(Realla  and  Collado-Vides  1996).  If  we  assume  that 
each  of  20  nucleotides  is  critical  for  the  definition  of 
the  operator  (Schneider  et  al  1986),  then  the  relative 
mutation  rate  in  this  case  is  t0  =  20.  Although  the  sizes 
of  initiator  sites  are  about  one-half  those  of  operator 
sites,  50  nucleotides  upstream  of  the  RNA  start  site 
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Figure  2— Thresholds  for  selection  of  a  wild-type  regula¬ 
tory  mechanism  with  a  negative  mode.  (A)  The  demand  is 
represented  with  a  logarithmic  scale,  and  one  sees  that 
Dmm  =  4.8E-6  and  Z)max  =  0.1.  (B)  The  demand  is  represented 
with  a  linear  scale.  Results  are  shown  for  the  nominal  values 
of  the  parameters  in  Table  1.  See  text  for  discussion. 


(Realla  and  Collado-Vides  1996),  the  information 
needed  to  locate  these  sites  within  the  genome  is  similar 
to  that  for  operators  (Schneider  et  al.  1986).  If  we 
assume  that  each  of  20  nucleotides  also  is  critical  for 
the  definition  of  the  initiator,  then  the  relative  mutation 
rate  in  this  case  is  t0  =  20. 

Relative  mutation  rate  for  loss  of  a  functional  regula¬ 
tor  protein,  p:  We  shall  assume  that  a  typical  regulator 
protein  has  30  amino  acid  residues  that  are  critical  for 


A 
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Figure  3. — Thresholds  for  selection  of  a  wild-type  regula¬ 
tory  mechanism  with  a  positive  mode.  (A)  The  demand  is 
represented  with  a  logarithmic  scale,  and  one  sees  that 
Anin  =  1-0.8  and  D,mx  =  1-1.5E-5.  (B)  The  demand  is  repre¬ 
sented  with  a  linear  scale.  Results  are  shown  for  the  nominal 
values  of  the  parameters  in  Table  1.  See  text  for  discussion. 


binding  to  its  modulator  site  (operator  in  the  case  of 
the  negative  mode  or  initiator  in  the  case  of  the  positive 
mode)  and  for  properly  affecting  transcription  initia¬ 
tion.  This  implies  that  the  regulator  gene  has  ^60  bases 
that  are  critical  because  the  identity  of  the  base  in  the 
third  codon  position  is  largely  irrelevant.  Thus,  we  ob¬ 
tain  a  relative  mutation  rate  of  p0  =  60  for  loss  of  a 
functional  regulator  protein. 

Relative  mutation  rate  as  a  function  of  gene  expres- 
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Figure  4. — Thresholds  for  discriminate  selection  of  wild- 
type  regulatory  mechanisms  with  negative  or  positive  modes. 
The  results  in  Figures  2B  and  3B  are  shown  here  on  the  same 
axes,  where  it  is  clear  that  the  Dmax  for  selection  of  the  negative 
mode  is  less  than  the  Dmin  for  selection  of  the  positive  mode. 


sion,  £:  There  is  evidence  to  suggest  that  the  rate  of 
spontaneous  mutation  increases  with  the  rate  at  which 
the  DNA  is  being  transcribed  (Datta  and  Jinks- Rob¬ 
ertson  1995).  There  also  is  evidence  to  suggest  that 
the  rate  of  spontaneous  mutation  decreases  because  of 
transcription-coupled  repair  mechanisms  (Francino  et 
al  1996) .  This  is  one  of  the  questions  we  wish  to  examine 
further  in  our  quantitative  analysis.  We  initially  assume 
that  there  is  no  significant  effect  on  the  mutation  rate 
either  way;  therefore,  we  assign  a  value  of  £q  =  1  for 
this  relative  mutation  rate.  If  one  were  to  assume  a 
change  in  mutation  rate  that  is  proportional  (or  in¬ 
versely  proportional)  to  the  rate  of  transcription,  then 
the  mutation  rate  relative  to  the  reference  would  be 
given  by  £  =  k  X  100  (or  V100),  where  k  is  the  propor¬ 
tionality  constant.  (Recall  that  the  capacity  for  regula¬ 
tion  is  assumed  to  be  100  and  that  expression  is  assumed 
to  be  fully  ON  or  fully  OFF.) 

Reference  growth  rate,  y:  We  shall  assume  that  E.  coli 
grows  with  a  doubling  time  of  1  hr  in  the  nutritionally 
richer  of  the  two  environments;  thus,  the  nominal  value 
for  the  reference  growth  rate  is  y0  =  1.0.  This  is  not  an 
unreasonable  value  because  it  is  known  that  bacteria 
like  E .  coli  can  double  in  a  period  as  short  as  20  min 
(Maal0e  and  Kjeldgaard  1965).  In  any  case,  the  sim¬ 
ple  value  of  unity  provides  a  convenient  reference; 
should  the  actual  value  be  different,  one  can  simply 
rescale  the  time  accordingly,  and  none  of  our  results 
would  change. 

Relative  growth  rate  with  loss  of  normal  expression, 

X:  Because  expression  is  either  fully  ON  or  fully  OFF 


and  the  capacity  for  regulation  is  100,  which  supports 
the  nominal  growth  rate,  a  failure  of  expression  is  as¬ 
sumed  to  result  in  a  basal  level  of  expression,  which 
would  support  only  a  100-fold  reduction  in  growth  rate  if 
there  were  no  other  carbon  source  in  the  environment. 
However,  in  the  complex  environment  of  the  intestinal 
tract  there  are  multiple  carbon  sources,  and  the  reduc¬ 
tion  in  growth  rate  will  therefore  be  less.  We  shall  as¬ 
sume  a  3%  reduction  in  growth  rate.  Thus,  the  nominal 
value  for  this  parameter  is  set  at  X0  =  0.97. 

Relative  growth  rate  with  superfluous  expression,  a: 
When  the  demand  is  such  that  a  function  is  normally 
turned  OFF  and  a  regulatory  mutation  causes  the  func¬ 
tion  to  be  fully  expressed  under  inappropriate  circum¬ 
stances,  the  cell  unnecessarily  expends  resources  for 
material  and  energy.  Experimental  evidence  in  the  case 
of  p-galactosidase  expression  in  E .  coli  (Novick  and 
Weiner  1957;  Koch  1983)  suggests  that  such  inappro¬ 
priate  expression  decreases  the  growth  rate  by  <1%; 
we  shall  assume  a  0.1%  reduction.  The  growth  rate, 
relative  to  the  reference  growth  rate,  is  thus  assigned  a 
nominal  value  of  a0  =  0.999. 

Relative  growth  rate  in  the  more  nutritionally  defi¬ 
cient  of  the  two  environments,  5:  From  measurements 
of  the  mean  transit  time  through  the  human  intestinal 
tract  (Cummings  and  Wiggins  1976;  Gear  et  al  1980), 
and  the  assumption  that  it  is  a  well-stirred  chemostat, 
one  can  calculate  that  the  average  doubling  time  for 
net  growth  of  E.  coli  in  the  intestinal  tract  is  about  40 
hr  (Savageau  1983) .  Because  the  intestinal  tract  is  not  a 
well-stirred  chemostat,  but  rather  a  very  heterogeneous 
environment  in  which  the  growth  is  undoubtedly  faster 
in  the  proximal  regions  and  slower  in  the  distal,  the 
doubling  time  of  E.  coli  in  the  more  deficient  distal 
environment  will  be  longer  than  the  average.  There  are 
no  good  measurements  to  go  by,  so  we  will  arbitrarily 
set  the  doubling  time  for  growth  in  the  more  deficient 
environment  to  be  two  times  the  average  value  given 
above.  Thus,  the  growth  rate  in  the  more  deficient  envi¬ 
ronment,  relative  to  the  reference  growth  rate  in  the 
richer  environment,  is  given  by  a  nominal  value  of 
S0  =  0.0125. 

Criterion  for  selection,  0:  Our  criterion  for  selection 
is  that  each  mutant  population  shall  be  reduced  to  no 
more  than  0.05%  of  the  wild-type  population  or,  alterna¬ 
tively,  that  the  sum  of  the  two  mutant  populations  shall 
be  reduced  to  no  more  than  0. 1  %  of  the  wild-type  popu¬ 
lation.  This  is  similar  to  values  that  are  found  in  the 
literature  (Leclerc  et  al  1996).  Thus,  the  criterion  for 
selection  is  assigned  a  nominal  value  of  0O  =  0.0005. 

Estimation  of  macroscopic  parameters:  The  values  of 
the  macroscopic  parameters  in  each  environment  and 
for  each  mode  of  control  are  determined  as  follows. 
First,  the  constituent  parameters  given  above  are  com¬ 
bined  to  represent  the  relevant  growth  rates  (gw,  gp,  g,,, 
gd).  The  growth  rate  of  the  wild-type  organism  gw  in  the 
first  environment  is  7  (the  reference),  and  in  the  second 
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Figure  5. — Influence  of 
the  constituent  parameters 
on  the  values  for  Z)min  and 
Dmax.  Each  parameter  is  var¬ 
ied  about  its  nominal  value, 
and  the  resulting  values  for- 
Z)min  and  A™*  are  calculated. 
Results  for  the  parameter  y 
are  not  shown  because  it  ex¬ 
hibits  no  influence  in  all 
cases.  (A)  Amax  for  the  neg¬ 
ative  mode  of  control.  (B) 
Dmm  for  the  negative  mode 
of  control.  (C)  Dmin  for  the 
positive  mode  of  control. 
(D)  Atiax  for  the  positive 
mode  of  control.  The  local 
parameter  sensitivities  are 
summarized  in  Table  2. 


it  is  7  multiplied  by  8,  the  relative  growth  rate  in  the 
nutritionally  deficient  environment.  The  growth  rates 
of  the  promoter  mutants  gp,  modulator  mutants  gmt  and 
promoter/ modulator  (double)  mutants  &  in  the  two 
environments  are  the  same  as  those  of  the  wild  type, 
but  multiplied  when  appropriate  by  relative  growth  rates 
that  reflect  either  loss  of  expression  that  is  normally 
ON  (X)  or  superfluous  expression  that  is  normally  OFF 
(cr).  For  example,  the  growth  rate  of  a  lac  modulator 
mutant  (gj  is  7  in  the  first  environment,  where  its 
pattern  of  gene  expression  mimics  that  of  the  wild  type, 
and  780  in  the  second,  where  expression  of  the  lac 
operon  is  superfluous.  The  growth  rate  of  a  mal  modula¬ 
tor  mutant  is  7  in  the  first  environment,  where  its  pat¬ 
tern  of  gene  expression  mimics  the  wild  type,  and  78X 


in  the  second,  where  there  is  a  failure  to  express  the 
mal  operon. 

Second,  the  constituent  parameters  are  combined 
to  represent  the  mutation  rates  between  populations 
(Wpw,  Wnlw,  T^dp,  widJ.  Each  mutation  rate  is  given  by  the 
product  of  the  number  of  critical  bases  that  define  the 
structure  in  question  (rr,  v ,  t,  or  p),  the  spontaneous 
mutation  rate  per  base  per  DNA  replication  (p),  and  a 
factor  reflecting  transcription-related  mutation  or  re¬ 
pair  (e)  when  appropriate.  For  example,  the  rate  of 
production  of  lac  promoter  mutants  from  wild-type  or¬ 
ganisms  (mpW)  is  pe  in  the  first  environment,  where 
the  lac  operon  is  being  actively  transcribed,  and  itp  in 
the  second,  where  it  is  not.  The  rate  of  production  of 
mal  promoter  mutants  from  wild-type  organisms  is  up 
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Figure  5. — Continued . 


in  the  first  environment,  where  the  mal  operon  is  not 
being  transcribed,  and  vfxe  in  the  second,  where  it  is. 

Finally,  the  growth  rates  and  mutation  rates  are  com¬ 
bined  to  represent  the  macroscopic  rate-constant  pa¬ 
rameters  that  characterize  the  population  dynamics 
(am,,  app,  ap„„  amm,  amw,  add,  adp,  adm).  For  example,  the 
rate  constant  for  net  growth  of  the  promoter  mutant 
app  is  given  by  its  intrinsic  growth  rate  gp  minus  the  rate 
of  loss  due  to  the  production  of  double  mutants,  which 
is  given  by  the  mutation  rate  per  DNA  replication  mA? 
times  the  intrinsic  growth  rate  of  the  promoter  mutant 
gp.  The  rate  constant  for  production  of  promoter  mu¬ 
tants  from  the  wild-type  population  apw  is  given  by  the 
mutation  rate  per  DNA  replication  m^,  times  the  intrin¬ 
sic  growth  rate  of  the  wild-type  organism  gK.  The  other 
rate-constant  parameters  are  determined  in  a  similar 


fashion.  Thus,  txmv  =  [1  (^tpw  "f  w^nw)  ]^w,  tzpp 

(1  W!dp )gpt  Upv  =  amm  =  (1  —  JAlm)grm  mmwgw< 

^!d>  ^dp  ^dp^p’  ^dm 

SELECTION  OF  WILD-TYPE  REGULATORY 
MECHANISMS 

Determination  of  the  thresholds  for  selection:  The 
threshold  for  selection  of  a  wild-type  promoter  or  modu¬ 
lator  is  determined  as  follows.  As  described  above,  the 
constituent  parameters  are  combined  to  represent  the 
relevant  growth  rates  and  mutation  rates  that  enter  into 
the  macroscopic  parameters  that  characterize  the  popu¬ 
lation  dynamics  of  mutant  and  wild-type  organisms.  The 
population  dynamic  equations  are  solved  in  the  two 
environments  to  yield  an  equation  for  the  ratio  of  mu- 
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Figure  5. — Continued. 


tant  to  wild-type  population  numbers.  This  ratio  is  set 
equal  to  0,  the  criterion  for  selection,  and  the  resulting 
equation  expresses  the  relationship  between  cycle  time 
C  and  the  demand  for  gene  expression  D  that  consti¬ 
tutes  the  threshold  for  selection. 

The  threshold  for  selection  of  a  wild-type  promoter 
or  modulator  (regulator)  can  be  obtained  by  solving 
the  threshold  equation  for  C  as  a  function  D  using  the 
method  of  bisection  (Press  et  al  1988).  These  numeri¬ 
cally  calculated  thresholds  and  the  analytically  deter¬ 
mined  asymptotes  (Savageau  1998)  are  nearly  indistin¬ 
guishable,  except  in  the  region  of  transition  between 
low-  and  high-C  asymptotes.  Only  those  values  of  Cand 
D  that  lie  in-the  region  of  overlap  below  both  the  pro¬ 
moter  and  modulator  thresholds  will  allow  selection  of 
the  wild-type  regulatory  mechanism. 


Regions  in  which  selection  for  negative  and  positive 
modes  is  realizable:  When  nominal  values  are  assumed 
for  the  parameters  of  the  model  (see  Table  1),  one  finds 
the  thresholds  plotted  in  Figures  2  and  3.  The  thresholds 
shown  in  Figure  2  for  the  negative  mode  of  control  (3% 
selection  coefficient  against  the  promoter  mutant  and 
0.1%  against  the  modulator  mutant)  exhibit  a  narrow 
region  of  overlap.  In  contrast,  the  thresholds  shown  in 
Figure  3  for  the  positive  mode  of  control  (0.1%  selection 
coefficient  against  the  promoter  mutant  and  3%  against 
the  modulator  mutant)  exhibit  a  wide  region  of  overlap. 
As  predicted  (Savageau  1998),  the  positive  and  nega¬ 
tive  modes  are  associated  with  asymmetric  regions  of 
the  C  vs.  D  plot  in  which  selection  is  realizable.  Note 
that  the  horizontal  axis  in  the  case  of  the  positive  mode 
is  plotted  as  values  of  1  -  D,  instead  of  D ;  this  allows 
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us  to  distinguish  more  clearly  the  threshold  values  near 
unity  when  plotting  demand  in  logarithmic  coordinates. 
The  extent  of  the  asymmetry  is  perhaps  more  evident 
when  the  thresholds  for  selection  against  the  modulator 
mutants  are  plotted  on  the  same  scale  for  both  the 
negative  and  positive  modes  (Figure  4).  The  regions  for 
which  selection  is  realizable  are  nonoverlapping,  which 
indicates  discriminate  selection  for  the  positive  and  neg¬ 
ative  modes. 

Influence  of  parameters  on  minimum  and  maximum 
values  for  demand:  Selection  requires  the  demand  for 
gene  expression  to  be  greater  than  the  asymptote  for 
the  minimum  threshold  and  less  than  the  asymptote 
for  the  maximum  threshold,  that  is,  Dmm  <  D  <  Z)max. 
The  parameters  in  our  model  influence  these  asymp¬ 
totic  values  to  various  degrees.  It  is  important  to  exam¬ 


ine  a  range  of  values  for  each  of  the  parameters  because 
there  is  some  uncertainty  in  the  nominal  values  for  many 
of  them.  We  have  systematically  varied  each  parameter 
about  its  nominal  value  given  in  Table  1  and  observed 
the  resulting  changes  in  the  minimum  and  maximum 
values  for  demand.  The  results  are  shown  in  Figure  5. 

Five  classes  of  influence  can  be  discerned  in  Figures 
5A-5D.  First,  in  many  cases  there  is  no  discernible  in¬ 
fluence  [Figure  5A  (tt  and  X),  Figure  5B  (p,  t,  e,  and 
a),  Figure  5C  (v,  t,  e,  and  a),  and  Figure  5D  (p,  t,  and 
X)].  Second,  in  several  cases  there  is  a  nearly  linear 
variation  with  the  change  in  parameter  value  [Figure 
5A  ( jx,  p,  e,  8,  and  0),  Figure  5B  (p,  tt,  8,  and  0),  and 
Figure  5D  (jjl,  u,  e,  8,  and  0)].  Third,  in  five  cases  there 
is  a  nearly  cube-root  influence  [Figure  5A  (t)  and  Figure 
5C  (jjl,  p,  8,  and  0)].  Fourth,  in  two  cases  there  is  a 
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TABLE  2 

Influence  of  the  constituent  parameters  on  the  values 
of 

-^min  and  Dmm 


Parameter  sensitivities  (6  log  •  /d  log  p ) a 
Negative  mode  Positive  mode 


Parameter,  p 

Anin* 

Ana/ 

1  -  Anin" 

1  -  Ana/ 

P' 

1.00 

-0.990 

-0.205 

1.00 

TT 

1.00 

0.000 

— 

— 

V 

— 

— 

0.000 

1.00 

T 

0.000 

-0.327 

-0.0513 

0.000 

P 

0.000 

-0.769 

-0.154 

0.000 

8 

0.000 

-0.895 

0.000 

LOO 

7 

0.000 

0.000 

0.000 

0.000 

x/ 

32.3 

0.000 

-6.63 

0.000 

<jg 

0.000 

-989 

0.000 

1000 

8 

1.00 

0.895 

0.208 

1.00 

e 

-1.00 

0.989 

0.205 

-1.00 

fl  Sensitivities  are  calculated  with  D  for  the  negative  mode 
and  with  1  —  Z)for  the  positive  mode  (see  text  for  discussion). 

b  Dm in  is  determined  by  the  threshold  for  selection  of  the 
wild-type  promoter. 

fDmax  is  determined  by  the  threshold  for  selection  of  the 
wild-type  modulator-repressor  interaction. 

d  Dm-m  is  determined  by  the  threshold  for  selection  of  the 
wild-type  modulator-activator  interaction. 

eDm^x  is  determined  by  the  threshold  for  selection  of  the 
wild-type  promoter. 

^For  the  negative  mode,  X  represents  the  growth  rate  of  the 
promoter  mutant  relative  to  the  wild  type;  for  the  positive 
mode,  X  represents  the  growth  rate  of  the  modulator  mutant 
relative  to  the  wild  type. 

^For  the  negative  mode,  a  represents  the  growth  rate  of 
the  modulator  mutant  relative  to  the  wild  type;  for  the  positive 
mode,  a  represents  the  growth  rate  of  the  promoter  mutant 
relative  to  the  wild  type. 


moderate  (order  of  magnitude)  amplification  of  the 
response  to  a  change  in  parameter  value  [Figure  5B  (X) 
and  Figure  5C  (X)].  Finally,  in  two  cases  there  is  an 
extreme  (1000-fold)  amplification  of  the  response  to  a 
change  [Figure  5A  (a)  and  Figure  5D  (a)].  The  results 
obtained  for  the  negative  and  positive  modes  exhibit 
different  patterns.  The  influences  in  the  local  region 
about  the  nominal  values  can  be  summarized  numeri¬ 
cally  by  the  parameter  sensitivities  (Shiraishi  and  Sav¬ 
ageau  1992),  as  shown  in  Table  2. 

From  these  results  one  can  see  that  the  most  influen¬ 
tial  parameter  is  the  selection  coefficient  that  reflects 
the  diminished  growth  rate  of  the  organism  when  there 
is  superfluous  gene  expression.  For  the  negative  mode 
of  gene  control,  this  corresponds  to  the  diminished 
growth  rate  of  the  modulator  (regulator)  mutants  that 
express  the  effector  function  constitutively  when  it 
should  be  OFF.  For  the  positive  mode,  this  corresponds 
to  the  diminished  growth  rate  of  the  promoter  mutants 
that  express  the  effector  function  at  a  high  level  when 
it  should  be  OFF. 


A  C  =3000  B  C  =300 


Figure  6. — Enrichment  of  the  wild-type  regulatory  mecha¬ 
nism  with  time.  The  numbers  of  wild-type,  modulator-mutant, 
and  promoter-mutant  organisms  are  represented  by  the  vari¬ 
ables  X^,  and  Xp.  Initially,  the  ratio  XK/(Xm  +  Xp),  which 
is  the  reciprocal  of  the  mutant  fraction  fm,  is  one-tenth  of  its 
steady-state  (ss)  value,  and  the  two  types  of  mutants  are  equally 
abundant.  The  ratio  is  normalized  and  plotted  as  a  function 
of  time  in  units  of  nC  (see  text  for  discussion) .  The  normalized 
ratio  is  given  by  10  X  [X;v/(X;n  +  ^)]/[X/(*m  +  Xp)]*  or 
10  fm/fm  so  that  its  values  vary  between  0  and  Iona  logarithmic 
scale.  Time  courses  are  shown  for  various  values  of  demand 
D.  (A)  The  negative  mode  of  gene  control  with  a  cycle  time 
of  C  —  3000  hr.  (B)  The  positive  mode  of  gene  control  with 
a  cycle  time  of  C  =  300  hr. 

Time  course  of  selection:  The  numbers  of  wild-type, 
modulator-mutant,  and  promoter-mutant  organisms  are 
represented  by  the  variables  A*,  Xn,,  and  Xp.  The  ratio 
Av/  (An,  +  Ap)  is  equal  to  the  reciprocal  of  the  mutant 
fraction ,  which  we  define  as  fm.  If  we  start  with  equal 
numbers  for  the  two  types  of  mutants  and  a  ratio  A*./ 
(A™  +  Ap)  that  is  one-tenth  of  its  steady-state  value,  then 
the  enrichment  of  the  wild-type  regulatory  mechanism 
with  time  is  obtained  from  the  solution  of  the  popula¬ 
tion  dynamic  equations  (Equation  9  in  Savageau  1998). 
Given  the  nominal  values  for  the  parameters  in  Table 
1,  the  time  course  of  selection  for  various  values  of 
demand  D  is  as  shown  in  Figure  6.  The  temporal  behav- 
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Figure  7. — The  extent  of 
selection  as  a  function  of  de¬ 
mand  for  gene  expression  D 
and  cycle  time  C.  The  extent 
of  selection  is  given  by  the 
steady-state  value  of  the  ra¬ 
tio  K'/(X<n  +  XJ  or  1//S. 
The  parameters  have  the 
nominal  values  given  in  Ta¬ 
ble  1 .  (A)  Negative  mode  of 
gene  control.  (B)  Positive 
mode  of  gene  control.  See 
text  for  discussion. 


ior  of  the  populations  is  independent  of  the  cycle  time 
C.  The  time  scale  is  actually  discrete,  given  by  values  of 
nC ,  where  n  is  the  number  of  cycles.  Thus,  within  a 
fixed  time  period,  the  same  degree  of  enrichment  can 
be  achieved  with  either  a  large  value  for  C  and  a  small 
number  n,  or  a  small  value  for  C  and  a  larger  number 
n .  Note  that  the  negative  mode  of  gene  control  emerges 
more  rapidly  as  demand  for  gene  expression  increases , 
whereas  the  positive  mode  of  gene  control  emerges 
more  rapidly  as  demand  for  gene  expression  decreases 
(Figure  6).  The  extent  and  rate  of  selection  are  exam¬ 
ined  in  greater  detail  below. 

Extent  of  selection:  We  define  the  extent  of  selection  as 
the  steady-state  value  of  the  ratio  X*/ (A™  +  Ap),  which 
is  the  inverse  of  the  mutant  fraction  in  the  population 
Although  there  is  selection  for  the  wild-type  reg¬ 
ulatory  mechanism  throughout  the  region  of  overlap 
beneath  the  thresholds  (e.g.,  Figures  2A  and  3A),  the 
extent  of  the  selection  varies  as  a  function  of  cycle  time 
C  and  demand  D.  For  a  given  value  of  C,  the  extent  of 
selection  reaches  its  maximum  at  a  value  of  D  that  is 
roughly  the  geometric  mean  of  its  threshold  values. 
With  the  nominal  values  for  the  parameters  (Table  1), 
the  results  for  the  negative  mode  of  gene  control  are 
as  shown  in  Figure  7A;  the  results  for  the  positive  mode 
are  similar  to  those  for  the  negative  mode,  except  that 
the  allowable  values  for  demand  now  occur  in  the  high- 
demand  region  of  the  plot  (Figure  7B).  The  maximum 
extent  of  selection  for  the  positive  mode  of  gene  control 
is  ~  10-fold  greater  than  that  for  the  negative  mode. 

Rate  of  selection:  The  rate  at  which  selection  occurs 
is  independent  of  cycle  time.  We  define  response  time  as 
the  time  required  for  the  ratio  A*/  (A™  +  Ap)  to  reach 
99%  of  its  steady-state  value,  starting  from  an  initial  state 
in  which  the  numbers  of  the  two  types  of  mutants  are 
equal  and  the  ratio  is  equal  to  one-tenth  of  its  steady- 


state  value.  Recall  that  the  time  points  are  given  in  units 
of  nC,  where  C  is  the  cycle  time  and  n  is  the  number 
of  cycles.  The  same  temporal  behavior  is  obtained  re¬ 
gardless  of  whether  C  is  large  ( n  small)  or  small  (n 
large) .  However,  the  resolution  is  poorer  for  large  values 
of  C  because  the  minimum  value  of  n  is  one. 

Like  the  extent  of  selection,  the  rate  of  selection  is 
strongly  dependent  on  the  demand  for  gene  expression. 
Although  selection  in  the  case  of  the  negative  mode 
can  occur  near  the  lower  limit  of  allowable  values  for 
D ,  the  response  time  is  very  long.  Response  time  de¬ 
creases  in  an  inverse  fashion  as  D  increases,  until  a  lower 
plateau  is  reached  (Figure  8A).  The  break  in  the  curve 
occurs  at  approximately  the  value  of  D  that  yields  the 
maximum  extent  of  selection  (see  Figure  7A) .  The  mini¬ 
mum  response  time  with  the  nominal  values  for  the 
parameters  is  ^294,000  hr  ('■'-'36  yr) . 

Similar  results  are  found  with  the  positive  mode  of 
gene  control  (Figure  8B) ,  except  that  the  long  response 
times  occur  near  the  upper  limit  of  allowable  values  for 
D.  The  response  time  decreases  as  D  decreases  until  a 
minimum  is  reached,  and  then  it  increases.  For  the 
same  extent  of  selection  as  the  negative  mode  (18,400), 
the  positive  mode  exhibits  a  faster  response  time 
(^-T 7,000  vs.  294,000  hr);  alternatively,  for  the  same 
response  time  as  the  negative  mode  (294,000  hr),  the 
positive  mode  exhibits  a  greater  extent  of  selection 
(^214,000  vs.  18,400).  Thus,  it  appears  that  the  positive 
mode  of  gene  control  is  capable  of  achieving  greater 
extents  of  selection  with  faster  response  times  than  is 
the  case  for  the  negative  mode  of  control. 

Minimum  cycle  time:  Estimates  for  the  minimum  cy¬ 
cle  time  of  E.  coli  passing  from  one  host  to  another  can 
be  obtained  by  combining  the  information  in  Figure 
2A  with  the  inverse  relationship  between  C  and  D  for 
the  lactose  operon  of  E.  coli.  (Recall  from  the  ecology 
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Figure  8. — The  rate  of 
selection  as  a  function  of  de¬ 
mand  for  gene  expression 
D.  Response  time  is  mea¬ 
sured  as  the  time  required 
to  achieve  99%  of  the 
steady-state  value  for  the  ra¬ 
tio  +  ^p)  starting 

from  an  initial  state  in 
which  this  ratio  is  initially 
one-tenth  of  the  steady-state 
value  and  the  mutants  are 
equally  abundant.  The  pa¬ 
rameters  have  the  nominal 
values  given  in  Table  1.  (A) 
Negative  mode  of  gene  con¬ 
trol.  (B)  Positive  mode  of 
gene  control.  See  text  for 
discussion. 
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and  gene  expression  section  that  C  =  3/D.)  The  inter¬ 
section  of  this  inverse  relationship  with  the  threshold  for 
selection  of  the  wild-type  modulator  (regulator)  gives  a 
value  of  qilin  =  26  hr  (Figure  9A) .  Another  estimate  of 
minimum  cycle  time  can  be  obtained  by  combining  the 
information  in  Figure  3 A  with  the  inverse  relationship 
between  C  and  (1  —  D)  for  the  maltose  operon  of  E . 


coll  [Recall  from  the  ecology  and  gene  expression 
section  that  C  =  6/(1  —  D).]  The  intersection  of  this 
inverse  relationship  with  the  threshold  for  selection  of 
the  wild-type  modulator  (regulator)  gives  a  value  of 
C’llin  =  10  hr  (Figure  9B). 

Maximum  cycle  time:  Estimates  for  the  maximum 
cycle  time  of  E.  coli  passing  from  one  host  to  another 
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Figure  9. — Predicted  val¬ 
ues  for  cycle  times 
and  Q,p  based  on  the  transit 
time  through  the  proximal 
portions  of  the  intestinal 
tract  in  humans.  (A)  Locus 
of  C  values  for  the  negative 
mode  of  control  is  given  by 
the  1/D  relationship  based 
on  the  period  available  for 
bacterial  utilization  of  lac¬ 
tose  (~3  hr).  (B)  Locus  of  C 
values  for  the  positive  mode 
of  control  is  given  by  the 
1/D  relationship  based  on 
the  period  not  available  for 
utilization  of  maltose  (M3 
hr).  The  locus  in  each  case 
intersects  the  threshold  for 
selection  of  the  wild-type 
modulator  (regulator)  at 
the  value  for  Qnin  and  the 
threshold  for  selection  of 
the  wild-type  promoter  at 
the  value  for  The  value 
of  Q p  is  found  on  the  locus 
at  a  value  of  Dnp,  which  cor¬ 
responds  to  the  value  of  D 
that  yields  the  optimum  ex¬ 
tent  (Figure  7)  and  rate 
(Figure  8)  of  selection.  See 
text  for  discussion. 
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also  can  be  obtained  by  combining  the  information  in 
Figure  2A  with  the  inverse  relationship  between  C  and 
D  for  the  lactose  operon  of  E.  coli.  The  intersection  of 
the  inverse  relationship  C  =  3/D  with  the  threshold  for 
selection  of  the  wild-type  promoter  (Figure  9A)  gives  a 
value  of  =  580,000  hr  (~66  yr).  Again,  the  data  for 
the  maltose  operon  in  Figure  3A  provide  an  alternative 
estimate.  The  intersection  of  the  inverse  relationship 
C  =  6/(1  -  D)  with  the  threshold  for  selection  of  the 
wild-type  promoter  (Figure  9B)  gives  a  value  of  Q,ax  = 
502,000  hr  (^57  yr). 

Optimal  cycle  time:  Although  the  estimates  for  mini¬ 
mum  and  maximum  cycle  time  in  the  preceding  sections 
are  of  some  interest,  perhaps  the  more  relevant  issue 
is  the  nominal  value  of  the  cycle  time  for  E.  coli  in  its 
natural  environment.  We  argue  that  the  most  probable 
values  for  the  cycle  time  will  be  those  corresponding  to 
the  values  for  demand  that  lead  to  the  optimal  extent 
and  rate  of  selection.  For  the  negative  mode,  the  opti¬ 
mum  extent  and  rate  of  selection  occur  with  a  value  of 
demand  Dop  =  0.001  (Figures  7A  and  8A).  Combining 
this  optimum  value  for  D  with  the  inverse  relationship 
C  =  3/D  in  Figure  9A  for  the  lactose  operon  yields  an 
estimate  for  the  nominal  value  of  the  cycle  time,  namely, 
Qp  =  3000  hr  (  4  mon).  For  the  positive  mode,  the 
optimum  extent  and  rate  of  selection  occur  with  a  value 
of  demand  1  -  Dop  ~  0.01  (Figures  7B  and  8B).  Combin¬ 
ing  this  optimum  value  for  1  ~  D  with  the  inverse  rela¬ 
tionship  C  =  6/(1  -  D)  in  Figure  9B  for  the  maltose 
operon  yields  an  estimate  of  Cop  =  800  hr  (~33  days). 

DISCUSSION 

The  application  of  demand  theory  presented  in  this 
article  provides  an  opportunity  to  test  a  number  of  the 
theory’s  quantitative  implications.  The  results  in  Sav- 
ageau  (1998)  led  to  the  prediction  of  well-defined  re¬ 
gions  within  the  C  vs.  D  plot  where  selection  for  the 
positive  and  negative  modes  of  gene  control  is  realiz¬ 
able.  These  regions  allow  one  for  the  first  time  to  specify 
precisely  what  is  meant  by  high  and  low  demand.  With 
the  nominal  values  for  the  parameters  of  the  lactose 
and  maltose  operons  in  E.  coli ,  selection  of  the  negative 
mode  of  control  requires  a  demand  between  0.000005 
and  0.1  (Figure  2A),  whereas  selection  of  the  positive 
mode  requires  a  demand  between  0.2  and  0.999985 
(Figure  3A).  Furthermore,  these  regions  were  predicted 
to  exhibit  an  inherent  asymmetry  with  the  positive  mode 
having  the  larger  region  within  which  selection  is  realiz¬ 
able.  This  is  clearly  seen  in  the  case  of  the  lactose  and 
maltose  examples  analyzed  here  (Figure  4). 

Although  the  minimum  and  maximum  values  of  de¬ 
mand  are  influenced  by  a  number  of  parameters,  by  far 
the  most  influential  parameter  is  a,  which  reflects  the 
reduction  in  growth  rate  when  there  is  superfluous  ex¬ 
pression  of  a  gene  (Figure  5A  and  5D).  The  nominal 
value  for  this  parameter  was  set  at  0.1%,  on  the  basis 


of  data  for  the  lactose  operon  that  suggest  a  value  <1% 
for  the  reduction  in  growth  rate  of  operator-constitutive 
mutants  in  a  low-demand  environment.  In  the  case  of 
the  positive  mode,  the  same  value  was  used  to  character¬ 
ize  the  reduction  in  growth  rate  of  an  up-promoter 
mutant  in  a  low-demand  environment.  A  0.1%  variation 
in  a  yields  a  twofold  change  in  the  value  of  Z)max  for 
both  the  negative  and  positive  mode  (Table  2).  The 
remaining  parameters  have  much  less  influence  on  the 
limits  of  D;  approximately  one-half  exhibit  a  nearly  lin¬ 
ear  influence,  whereas  the  other  half  have  a  negligible 
influence. 

The  ratio  of  mutation  rate  to  selection  coefficient  is 
the  most  relevant  determinant  of  the  realizable  region 
for  selection.  Indeed,  if  the  target  sizes  for  the  various 
types  of  mutations  are  increased  by  an  order  of  magni¬ 
tude  (e.g,  to  match  the  footprint  for  binding  a  regulator 
protein  to  its  modulator  site  on  the  DNA)  at  the  same 
time  the  selection  coefficients  are  increased  by  an  order 
of  magnitude,  then  the  results  are  essentially  unchanged 
(data  not  shown). 

The  results  in  Figure  5  suggest  that  the  effect  of  tran¬ 
scription  on  mutation  rate  may  be  significant  only  if 
it  reduces  the  mutation  rate.  The  parameter  8,  which 
represents  this  effect,  has  no  influence  on  the  selection 
of  the  wild-type  promoter  when  there  is  a  negative  mode 
of  control  (Figure  5B).  This  is  counter  to  the  intuitive 
expectation  that  suggests  a  lower  mutation  rate  would 
aid  the  selection  of  the  wild-type  promoter  when  it  is 
not  in  use.  The  results  in  Figure  5A  show  that  the  param¬ 
eter  8  can  represent  an  increased  selection  for  the  wild- 
type  repressor-modulator  interaction  (increased  Dmax) 
if  there  is  an  increase  in  transcription-coupled  repair 
(decrease  in  e).  In  the  case  of  the  positive  mode,  £  has 
negligible  influence  on  the  selection  of  the  wild-type 
activator-modulator  interaction  (Figure  5C),  and  its  in¬ 
fluence  on  the  selection  of  the  wild-type  promoter  (Fig¬ 
ure  5D)  would  appear  to  be  of  little  consequence  be¬ 
cause  the  threshold  value  of  Z)max  in  this  case  is  already 
so  high.  Given  the  nominal  values  we  have  used  for  the 
parameters,  8  does  not  seem  to  be  highly  significant, 
and  similar  effects  can  be  achieved  by  varying  other 
parameters;  nevertheless,  £  might  still  be  important  for 
selection  under  other  conditions. 

The  equations  that  characterize  the  population  dy¬ 
namics  of  mutant  and  wild-type  organisms  (Equations 
7-11  in  Savageau  1998)  led  to  the  prediction  that  the 
extent  of  selection  is  a  function  of  cycle  time  C  and 
maximal  at  intermediate  values  of  demand  D,  whereas 
the  rate  of  selection  is  independent  of  cycle  time  C. 
Indeed,  with  the  parameter  values  in  Table  1,  the  extent 
of  selection  increases,  reaches  a  maximum,  and  then 
declines  as  demand  increases  (Figure  7).  As  seen  in 
Figure  8,  the  time  required  to  reach  full  selection  de¬ 
creases  until  a  minimum  is  reached  with  increasing  de¬ 
mand  (negative  mode)  or  decreasing  demand  (positive 
mode).  The  combination  of  these  results  suggests  that 
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the  optimum  extent  and  rate  of  selection  occurs  at 
around  D  =  0.001  for  the  negative  mode  and  1  —  D  = 
0.01  for  the  positive  mode.  In  the  case  of  the  positive 
mode,  this  represents  a  choice  of  1  -  D  that  yields  a 
rate  of  selection  that  is  nearly  equivalent  to  the  optimum 
for  the  negative  mode. 

The  quantitative  theory  reveals  a  number  of  new  rela¬ 
tionships  involving  cycle  time  that  can  be  tested  against 
experimental  data  in  the  case  of  the  lactose  and  maltose 
operons  of  E.  coli.  The  first  such  relationship  provides 
an  estimate  for  the  minimum  value  of  the  cycle  time 
We  obtained  values  of  26  hr  (Figure  9A)  and  10 
hr  (Figure  9B) ,  which  is  on  the  same  order  of  magnitude 
as  the  40  hr  required  on  average  for  transit  through 
the  entire  intestinal  tract  (Cummings  and  Wiggins 
1976;  Gear  et  al  1980;  Savageau  1983).  Under  these 
circumstances,  E .  coli  is  simply  passing  through  the  in¬ 
testinal  tract  without  colonizing  the  colon.  Clearly,  the 
cycle  time  can  be  no  shorter  than  this  period. 

The  second  relationship  provides  an  estimate  for  the 
maximum  value  of  the  cycle  time  We  have  esti¬ 
mated  this  value  to  be  ^580,000  hr  (^66  yr)  in  the  case 
of  the  lactose  operon  (Figure  9A)  and  502,000  hr  (^57 
yr)  in  the  case  of  the  maltose  operon  (Figure  9B) .  These 
values  for  Cnax  are  on  the  same  order  of  magnitude 
as  the  120-yr  maximum  for  the  life  span  of  humans 
(Hayflick  1977).  Clearly,  the  cycle  time  for  E.  coli  can 
be  no  longer  than  the  life  time  of  the  host  because  the 
bacteria  will  die  with  the  host  if  they  do  not  colonize  a 
new  host. 

The  final  relationship  provides  an  estimate  for  the 
optimum  value  of  the  cycle  time  Gp.  The  optimum  ex¬ 
tent  and  rate  of  selection  determined  for  the  lactose 
operon  suggest  a  demand  in  the  neighborhood  of  Dop  — 
0.001.  This  value  of  D,  taken  together  with  the  relation¬ 
ship  D  =  3/  C,  predicts  an  optimum  cycle  time  of  Cop  = 
3000  hr  (~4  mon).  The  corresponding  estimate  based 
on  the  maltose  operon  is  Cop  —  800  hr  (^33  days). 
These  predicted  values  for  the  cycle  time  of  E .  coli  are 
comparable  with  the  cycle  times  (recolonization  rates) 
of  months  to  years  that  have  been  observed  in  humans 
for  resident  strains  of  E .  coli  (Sears  et  al  1950;  Sears 
and  Brownlee  1952;  Caugant  et  al  1981). 

In  summary,  the  quantitative  development  of  demand 
theory  presented  in  Savageau  (1998)  and  applied  here 
provides  the  first  estimates  for  the  minimum  and  maxi¬ 
mum  values  of  demand  that  are  required  for  selection 
of  the  positive  and  negative  modes  of  gene  control.  The 
specific  application  to  the  maltose  and  lactose  operons 
of  E .  coli  suggests  that  the  positive  and  negative  modes 
of  control  for  these  genes  are  subject  to  selection 
throughout  the  full  range  of  cycle  times  that  are  possible 
for  this  microbe.  Moreover,  the  cycle  times  predicted 
on  the  basis  of  the  optimal  extent  and  rate  of  selection 
are  in  agreement  with  the  typical  cycle  times  that  have 
been  observed  experimentally. 
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Introduction 

The  need  for  a  systems  perspective  in  biology 
has  never  been  more  apparent  than  it  is  today. 
The  accumulation  of  data  concerning  the  basic 
determinants  of  biological  systems  was  greatly 
accelerated  when  genetics,  biochemistry  and 
microbiolog}'  were  fused  to  form  molecular 
biolog}’  in  the  1950s.  However,  this  information 
has  remained  largely  incomplete  and  fragmented. 
The  new  technologies  that  have  grown  out  of  the 
Human  Genome  Project  have  introduced  a  radi¬ 
cally  different  approach,  based  on  global 
measurements  of  the  organism's  phenotype. 
These  global  expression  systems  are  producing  a 
flood  of  data  that  must  be  related  to  the  under¬ 
lying  molecular  determinants.  Without  a  quanti¬ 
tative  systems  theory  with  which  to  relate  the 
information  at  these  different  levels  of  organiza¬ 
tion,  our  understanding  will  remain  descriptive 
and  lack  predictive  value. 

Biochemical  Systems  Theoiy  [1]  is  con¬ 
cerned  with  understanding  integrated  (systems 
level)  behaviour  in  terms  of  the  underlying 
(molecular  level)  determinants.  This  theory  is 
based  upon  the  power-law  formalism  [2],  which 
provides  a  flexible,  accurate  and  tractable  mathe¬ 
matical  representation  for  characterizing  system 
components  and  their  interactions.  Biochemical 
Systems  Theory  provides  methods  of  analysis 
that  are  capable  of  extracting  information  that  is 
latent  within  the  mathematical  representation  of 
the  integrated  system.  Most  importantly,  Bio¬ 
chemical  Systems  Theory  gives  us  a  strategy  for 
making  well  controlled  comparisons  that  are  at 
the  heart  of  biological  understanding  in  an  evolu¬ 


tionary'  context.  The  primary  aim  of  Biochemical 
Systems  Theory  is  to  elucidate  the  design  prin¬ 
ciples  that  characterize  intact  biological  systems. 

Biochemical  Systems  Theory  has  been 
applied  to  several  generic  classes  of  metabolic 
pathways  and  gene  circuits.  Here  I  will  examine 
three  elements  of  design  for  a  generic  class  of 
inducible  gene  circuits:  threshold  generation, 
gene  coupling  and  mode  of  control.  The  prin¬ 
ciples  that  have  been  discovered  in  each  case  will 
be  discussed  in  the  context  of  the  lactose  (lac) 
system  of  Escherichia  coli .  I  will  finish  with  a  few 
general  conclusions  that  can  be  drawn  from 
these  examples. 


Threshold  generation 

A  sharp  threshold  for  induction  of  a  catabolic 
system  will  prevent  premature  induction  of  the 
catabolic  machinery  when  there  is  an  inadequate 
(subthreshold)  supply  of  substrate  in  the  organ¬ 
ism’s  environment.  Conversely,  a  sharp  threshold 
will  produce  a  highly  responsive  induction  when 
the  substrate  supply  is  suprathreshold  and  suffi¬ 
cient  not  only  to  recoup  the  cost  of  synthesizing 
the  catabolic  machinery  but  also  to  provide  the 
organism  with  excess  carbon  and  energy  for  cel¬ 
lular  growth  and  function  [3].  Two  alternative 
means  for  the  generation  of  a  sharp  threshold  are 
static  and  dynamic  switches.  ^ 

An  example  of  a  static  switch  is  provided  by 
an  inducible  catabolic  system  in  which  the  sub¬ 
strate  is  the  inducer  that  interacts  with  the  regu 
lator  protein  to  produce  a  sigmoidal  influence  on 
the  rate  of  mRNA  transcription;  all  other  steps  W 
the  system  are  operating  in  a  first-order  fashion* 
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If  one  increases  the  concentration  of  substrate 
(inducer)  slowly  from  a  low  to  a  high  value,  gene 
expression  will  exhibit  a  low  value  that  changes 
slowly,  then  accelerates  through  the  mid-range 
values,  and  finally  achieves  a  high  value  that 
again  changes  only  slowly.  If  one  decreases  the 
concentration  of  substrate  slowly  from  high  to 
low  values,  the  same  values  for  gene  expression 
will  be  retraced,  but  in  the  opposite  order.  Thus 
the  process  is  completely  reversible,  with  no 
memory  for  the  past  history  of  substrate  concen¬ 
tration.  The  greater  the  degree  of  sigmoidicity 
(the  larger  the  Hill  number),  the  sharper  the 
switch  from  low  to  high  values  for  gene  expres¬ 
sion.  (In  the  mathematical  limit  of  an  infinite 
Hill  number,  the  sigmoidal  characteristic 
approaches  a  step  function  with  ony  two  values:  a 
low  value  for  expression  when  substrate  concen¬ 
trations  are  subthreshold,  and  a  high  value  when 
they  are  suprathreshold.)  This  substrate  induc¬ 
tion  model  describes  the  lac  system  in  a  strain  of 
E.  coli  that  is  freely  permeable  to  the  gratuitous 
inducer  isopropyl  /i-D-thiogalactoside  [4]. 

An  example  of  a  dynamic  switch  is  provided 
by  an  inducible  catabolic  system  that  is  similar  to 
that  described  above,  except  that  the  product 
rather  than  the  substrate  of  the  induced  enzymes 
is  the  inducer  of  gene  expression.  The  properties 
of  such  a  switch  are  quite  different  from  those  of 
the  static  switch.  If  one  increases  the  concentra¬ 
tion  of  substrate  slowly  from  a  low  to  a  high 
value,  the  rate  of  mRNA  synthesis  will  exhibit  a 
low  value  that  changes  slowly,  then  jumps  dis- 
continuously  to  a  higher  value  once  a  threshold 
concentration  has  been  crossed,  and  finally 
remains  at  a  high  value  that  again  changes  only 
slowly.  If  one  now  decreases  the  concentration  of 
substrate  from  high  to  low  values,  the  rate  of 
mRNA  synthesis  will  exhibit  a  high  value  that 
changes  slowly,  then  jumps  discontinuously  to  a 
lower  value  once  a  second  (lower)  threshold  con¬ 
centration  has  been  crossed,  and  finally  remains 
at  a  low  value  that  again  changes  only  slowly.  For 
substrate  concentrations  above  the  higher 
threshold  and  below  the  lower  threshold,  the 
system  exhibits  a  single  steady  state.  For  sub¬ 
strate  concentrations  between  these  two  thresh¬ 
old  values,  the  system  can  be  in  one  of  two 
different  stable  steady  states,  depending  upon 
"hich  threshold  was  the  last  to  be  crossed.  In 
this  sense,  the  process  is  irreversible,  with  a 
niemory  of  its  past. 

Induction  of  the  lac  system  exhibits  a  dynamic 
s"itch  when  wild-type  E.  coli  is  exposed  to  lac¬ 


tose  [5].  The  conventional  explanation  suggests 
that  transport  of  lactose  into  the  cell  is  ‘autocata- 
lytic’  in  the  following  sense.  The  intracellular 
product  of  transport  is  the  inducer,  which  causes 
further  induction  of  the  transport  system,  which 
in  turn  leads  to  a  further  increase  in  the  concen¬ 
tration  of  inducer.  Indeed,  a  dynamic  switch  can 
be  realized  by  a  product-inducible  catabolic 
system,  as  described  in  the  previous  paragraph. 
However,  the  natural  inducer  of  the  lac  operon  is 
not  a  product  of  the  system,  but  allolactose,  an 
intermediate  whose  synthesis  and  degradation 
are  both  induced.  When  the  position  of  the 
natural  inducer  is  shifted  from  product  to  inter¬ 
mediate  of  the  catabolic  pathway,  the  same  model 
that  produced  a  dynamic  switch  now  produces  a 
static  switch.  Thus  a  dynamic  switch  is  difficult 
to  reconcile  with  the  current  model  of  the  lac 
system,  i.e.  an  unbranched  pathway  consisting  of 
lactose  transport  (LacY)  [6]  and  catabolism 
(LacZ)  [7]  that  is  subject  to  co-ordinate  induc¬ 
tion  by  an  intermediate  that  has  a  sigmoidal 
influence  on  the  control  of  transcription. 

One  way  to  rectify  this  inconsistency  is  to 
postulate  a  non-inducible  alternative  fate  for  the 
intermediate.  If  the  alternative  fate  represents  a 
minor  contribution  to  the  total  degradation  of 
intermediate  (Figure  1A),  then  the  system 

behaves  essentially  like  an  intermediate-induced 
system.  As  the  concentration  of  substrate  (extra¬ 
cellular  lactose)  is  slowly  increased,  gene  expres¬ 
sion  follows  the  static  switch  associated  with  the 
sigmoidal  control  of  transcription  (Figures  IB 
and  1C).  If  the  alternative  fate  represents  a 
major  contribution  to  the  total  degradation  of 
intermediate  (Figure  ID),  then  the  system 

behaves  essentially  like  a  product-induced 
system.  Gene  expression  will  then  exhibit  the 
following  pattern  as  the  concentration  of  sub¬ 
strate  (extracellular  lactose)  is  slowly  increased 
(Figures  IE  and  IF):  (1)  expression  at  first 
shows  little  change  along  the  lower  portion  of  the 
sigmoidal  characteristic;  (2)  it  jumps  discontinu¬ 
ously  to  the  upper  portion  as  the  upper  threshold 
of  substrate  concentration  is  crossed  (Figure  IE, 
curve  a);  and  (3)  it  changes  little  thereafter  along 
the  upper  portion  of  the  sigmoidal  characteristic. 
Gene  expression  follows  a  different  pattern  as 
the  concentration  of  substrate  (extracellular  lac¬ 
tose)  is  slowly  decreased:  (1)  expression  at  first 
shows  little  change  along  the  upper  portion  of 
the  sigmoidal  characteristic;  (2)  it  jumps  dis¬ 
continuously  to  the  lower  portion  as  the  lower 
threshold  of  substrate  concentration  is  crossed 


Biochemics:  Society  Transactions 


(Figure  IE,  curve  b);  and  (3)  it  changes  little 
thereafter  along  the  lower  portion  of  the  sigmoi¬ 
dal  characteristic.  The  system  can  remain  in 
either  the  uppdr  or  the  lower  stable  state  when 
substrate  concentrations  have  values  between  the 
low'er  and  upper  thresholds  (Figure  IE,  curve  c). 
The  net  effect  of  substrate  concentration  on 
mRNA  levels  is  shown  in  Figures  1(C)  and  1(F). 

The  design  principles  for  the  generation  of 
thresholds  in  these  instances  are  revealed  at  the 
level  of  the  integrated  gene  circuitry.  Although  an 
essential  feature  of  both  types  of  switches,  i.e. 
the  sigmoidal  rate  of  mRNA  synthesis,  can  be 
elucidated  by  studies  of  transcription  initiation 
with  isolated  molecular  components  in  vitro ,  this 


information  alone  is  insufficient  to  distinguish 
between  the  two  types.  The  effects  of  induction 
on  the  synthesis  and  degradation  of  inducer  also 
play  a  critical  role.  This  shows  clearly  the  induc¬ 
tion  characteristics  are  a  property  of  the  intact 
system. 

Coupling  of  elementary  circuits 

Genes  interact  to  produce  complex  patterns  of 
expression  that  define  the  phenotype  of  the 
organism.  The  best  studied  examples  of  gene 
interaction  and  the  patterns  of  coupled  expres¬ 
sion  that  result  are  provided  bv  elementary  gene 
circuits  in  bacteria  (Figure  2A).  The  expression 
of  an  effector  gene  and  its  cognate  regulator  gene 


Figure  I 
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Thresholds  for  induction  of  gene  expression 

When  the  effect  of  induction  on  the  synthesis  of  an  inducer  is  less  than  or  equal  to  tnat  on 
degradation  (A),  gene  expression  exhibits  a  static  switch  (B.  C)  whereas  when  the  erect  on 
the§  synthesis  of  an  inducer  is  greater  than  that  on  degradation  (D).  gene  expression  _xnibits 
a  dynamic  switch  (E.  F).  At  any  fixed  concentration  of  substrate,  the  steady-state  levels  of 
mRNA  and  inducer  are  given  by  the  intersect. on  of  the  sigmoidal  curve  the  effect  o.  .nduce 
on  mRNA  levels)  and  the  appropriate  broken  curve  (the  effect  o  mRNA  on  inducer  evels) 
(B,  E).  The  net  effect  of  substrate  concentration  on  mRNA  levels  is  shown  in  (C)  and  (h). 
See  the  text  for  discussion. 
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Coupling  of  expression  in  elementary  gene  circuits 


•:  -  ■  Linked  circuits  for  regulator  and  effector  genes.  (B)  Regulator  expression  can  be  directly 
ccuDied.  uncoupled  or  inversely  coupled  with  (C)  effector  expression.  Abbreviations:  NA. 
^ceic  acids:  AA.  amino  acids. 
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exhibits  one  of  three  distinct  patterns  of  coup¬ 
ling.  Induction  of  effector  gene  expression 
(Figure  2C)  is  accompanied  by  an  increase,  a 
decrease  or  no  change  in  regulator  gene  expres¬ 
sion  (Figure  2B).  Each  of  these  three  forms  of 
coupling  has  been  documented  experimentally. 
However,  until  recently  it  was  unclear  whether 
the  form  of  coupling  follows  some  underlying 
rule  or  is  the  result  of  a  frozen  accident  in  the 
evolution  of  a  given  system.  (For  simplicity,  in 
what  follows,  I  shall  ignore  the  inversely  coupled 
patterns,  since  the  decrease  in  regulator  expres¬ 
sion  is  small  and  in  many  cases  difficult  to  dis¬ 
tinguish  from  the  case  of  uncoupled  expression.) 

There  is  now  evidence  to  suggest  that  the 
pattern  of  coupling  in  these  elementary  gene  cir¬ 
cuits  is  governed  by  a  simple  design  principle, 
which  was  discovered  as  follows.  Biochemical 
Systems  Theory  was  used  to  model  a  genetic 
system  of  elementary7  gene  circuits  capable  of 
representing  each  of  the  distinct  forms  of  coup¬ 
ling.  The  equations  describing  the  system  were 
solved  both  analytically  and  numerically  over  a 
'vide  range  of  parameter  values.  The  resulting 
behaviour  of  the  system  with  different  parameter 
settings  could  then  be  classified  on  the  basis  of 
several  criteria  for  functional  effectiveness, 
including  sharp  threshold  for  induction,  large 


logarithmic  gain  in  product  formation,  robust¬ 
ness  of  the  system  in  the  face  of  parameter  varia¬ 
tion,  regulator  selectivity,  system  stability  and 
temporal  responsiveness.  Finally,  a  strategy  for 
making  well-controlled  comparisons  among  the 
results  was  followed,  and  a  simple  rule  emerged. 
For  systems  that  were  well  designed  according  to 
these  a  priori  criteria,  we  discovered  that  if  the 
regulator  protein  exerts  negative  control  over 
effector  gene  expression,  then  expression  of  the 
regulator  and  effector  genes  is  uncoupled  when 
the  capacity  for  induction  is  large  and  strongly 
coupled  when  the  capacity  for  induction  is  small. 
Conversely,  if  the  regulator  protein  exerts  posi¬ 
tive  control  over  effector  gene  expression,  then 
expression  of  the  regulator  and  effector  genes  is 
uncoupled  when  the  capacity  for  induction  is 
small  and  strongly  coupled  when  the  capacity  for 
induction  is  large  [8,9]. 

The  lac  system,  with  negative  control  and  a 
high  capacity  for  induction,  presents  a  superb 
example  of  this  rule.  Specific  control  of  the  lac 
system  involves  a  negative  regulator,  the  classical 
lac  repressor.  The  capacity  for  /?-galactosidase 
induction  is  large  (approx.  1000-fold),  whereas 
the  capacity  for  lac  repressor  induction  is  essen¬ 
tially  zero  [10].  Thus  expression  of  the  regulator 
and  effector  erenes  in  the  lac  svstem  is 
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uncoupled,  as  expected  for  a  system  with  a  nega¬ 
tive  regulator  and  a  large  capacity  for  induction. 

These  predictions  were  tested  against  avail¬ 
able  experimental  data  for  an  additional  30 
systems  in  which  expression  of  both  the  regu¬ 
lator  and  effector  genes  had  been  measured,  and 
agreement  was  found  between  the  experimental 
measurements  and  the  predicted  pattern  of 
coupling  of  gene  expression  [9].  This  is  an 
example  of  a  design  principle  that  is  manifested 
at  the  level  of  integrated  gene  circuits.  It  could 
not  have  been  discovered  through  the  separate 
analysis  of  the  individual  circuits  in  isolation. 

Molecular  mode  of  control 

The  molecular  mode  of  gene  control  exhibits  a 
fundamental  duality  common  to  most,  if  not  all, 
control  systems.  Control  can  be  achieved  either 
by  removing  a  restraining  element  (the  negative 
mode  of  control)  or  by  providing  a  stimulatory* 
element  (the  positive  mode  of  control).  Numer¬ 
ous  examples  of  each  mode  have  been  docu¬ 
mented  in  the  literature.  Although  it  was  initially 
unclear  whether  rule  or  accident  dictated  the  use 
of  the  positive  or  the  negative  mode  in  any  par¬ 
ticular  system,  Biochemical  Systems  Theory*  has 
led  to  the  discovery  of  a  surprisingly  simple 
design  principle  that  predicts  the  use  of  positive 
or  negative  control. 

Biochemical  Systems  Theory  was  used  to 
model  systems  exhibiting  each  of  the  alternative 
modes.  An  exhaustive  analysis  showed  that,  in 
most  respects,  the  alternative  systems  behaved  in 
an  identical  fashion.  That  is,  systems  with  either 
the  positive  or  the  negative  mode  were  function¬ 
ally  equivalent  and  could  control  gene  expression 
equally  well.  However,  their  behaviour  differed  in 
diametrically  opposed  ways  in  response  to 
genetic  mutations  that  alter  the  components  of 
the  control  system  itself  [3].  In  response  to 
damaging  mutations,  systems  with  the  negative 
mode  of  control  exhibited  superfluous  expression 
of  the  effector  gene  when  it  should  be  turned 
OFF,  whereas  systems  with  the  positive  mode  of 
control  failed  to  express  the  effector  gene  when 
it  should  be  turned  ON.  A  selectionist  argument 
based  on  the  population  dynamics  of  mutant  and 
wild-type  organisms  in  different  environments 
leads  to  the  following  principle:  the  positive 
mode  of  control  will  prevail  when  there  is  a  high 
demand  for  effector  gene  expression  in  the 
organism’s  natural  environment,  whereas  the 
negative  mode  will  prevail  when  there  is  a  low 
demand  for  expression  [11,12]. 


This  demand  theory  of  gene  regulation  has 
now  been  developed  in  a  more  quantitative 
fashion  [13],  with  key  roles  being  played  by  two 
parameters.  One  is  the  average  cycle  time,  C, 
taken  for  a  gene  to  go  from  the  OFF  state  to  the 
ON  state  and  back  to  the  OFF  state  (Figure  3B). 
The  other  is  the  demand  for  expression,  D, 
which  is  the  fraction  of  the  cycle  time  during 
which  a  gene  is  turned  ON  (Figure  3C).  The 
solution  of  the  population  dynamic  equations  for 
mutant  and  wild-type  organisms  in  alternative 
environments  reveals  a  threshold  for  selection  in 
the  C  against  D  plot.  Selection  of  the  wild-type 
control  system  can  be  realized  only  when  values 
of  C  and  D  lie  below  the  threshold  (Figure  3D). 

The  regions  of  the  C  against  D  plot  within 
which  selection  is  possible  differ  for  the  alterna¬ 
tive  modes  of  control.  The  realizable  region 
occurs  at  low  values  of  D  for  the  negative  mode 
and  at  high  values  of  D  for  the  positive  mode, 
and  these  regions  exhibit  an  inherent  asymmetry, 
with  the  realizable  region  for  the  positive  mode 
being  larger. 

Application  of  this  theory  to  the  specific  case 
of  the  lactose  operon  in  £*.  coli  cycling  through 
the  human  intestinal  tract  (Figure  3A)  yields 
several  interesting  predictions  that  relate  to  the 
host  [14].  When  the  extremes  of  the  realizable 
region  intersect  the  inverse  relationship  between 
C  and  D,  which  is  due  to  the  fixed  3-h  period  of 
exposure  to  lactose  during  transit  through  the 
upper  portion  of  the  intestinal  tract  [15,16],  one 
obtains  predictions  for  the  minimum  and  maxi¬ 
mum  cycle  time  (Figure  3D).  The  minimum 
value  for  the  cycle  time  is  approx.  26  h,  which  is 
roughly  the  time  required  for  transit  through  the 
entire  intestinal  tract  [17-19].  Under  these  con¬ 
ditions,  E .  coli  passes  through  one  intestinal  tract 
after  another  as  fast  as  possible  without  coloniz¬ 
ing  a  single  colon.  The  maximum  value  for  the 
cycle  time  is  approx.  580  000  h  (~66  years), 
which  is  roughly  equivalent  to  the  life  span  of 
humans  [20].  A  longer  cycle  time  would  be 
impossible,  because  the  colonizing  bacteria 
would  die  with  the  host  before  recolonizing  a 
new  host. 

This  demand  theory  also  makes  predictions 
regarding  the  rate  (Figure  3E)  and  extent 
(Figure  3F)  of  selection  of  the  wild-type  control 
system,  and  these  exhibit  optima  that  can  be 
used  to  predict  the  nominal  value  for  demand, 
which  in  turn  leads  to  a  prediction  for  the  nomi¬ 
nal  cycle  time  (Figure  3D).  The  optimal  extent 
and  rate  of  selection  determined  for  the  lactose 
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operon  suggest  a  demand  ( D )  in  the  neighbour¬ 
hood  of  0.001.  This  value  of  D,  taken  together 
with  the  inverse  relationship  D  =  3/C,  predicts  an 
optimal  cycle  time  of  3000  h  (~4  months),  which 
is  comparable  with  the  cycle  times  (recoloni¬ 
zation  rates)  that  have  been  observed  in  humans 
for  resident  strains  of  E  coli  [21-23]. 

Demand  theory  provides  an  example  of  a 
biological  design  principle  that  only  becomes 
manifest  at  the  population  level  in  the  context  of 
natural  selection.  Again,  the  demand  principle 
could  not  have  been  discovered  by  examining 
molecular  structures  and  interactions  alone,  nor 
could  it  have  been  discovered  at  the  level  of  inte¬ 
grated  gene  circuits  alone.  The  demand  principle 
requires  consideration  of  mutants  and  the 
environment  external  to  the  system. 

Discussion 

The  details  of  Biochemical  Systems  Theory  that 
are  involved  in  the  applications  presented  here 
are  beyond  the  scope  of  this  paper.  However,  a 
few  general  comments  are  in  order. 


Biochemical  Systems  Theory  has  several 
general  goals,  including  answers  to  the  questions 
What,  How  and  Why.  What  are  the  relevant  com¬ 
ponents  of  the  system  under  study?  How  do 
these  components  interact  so  as  to  produce  the 
behaviour  that  is  observed  in  the  real  system? 
Why  is  the  system  designed  in  this  particular  way 
and  not  some  other?  These  and  other  goals  are 
not  unrelated,  since  the  pursuit  of  one  goal  will 
often  provide  information  that  is  needed  in  the 
pursuit  of  another.  However,  the  primary  goal  of 
this  theory  is  to  discover  the  biological  design 
principles  that  emerge  at  each  level  of  organiza¬ 
tion  in  various  generic  classes  of  systems. 

What  is  common  to  these  successful  expla¬ 
nations  of  design,  and  can  this  success  be 
extended  to  more  complex  gene  circuitry'  [24]? 
There  are  two  aspects  of  these  examples  that 
seem  most  important.  First,  in  each  case  one  is 
able  to  identify-  a  limited  number  of  possible  vari¬ 
ations  on  a  theme:  static  versus  dynamic 
switches;  coupled  versus  uncoupled  circuits; 
positive  versus  negative  modes  of  control.  Even 


Figure3 

Selection  for  the  negative  mode  of  control 

A)  Two  regions  of  the  human  intestinal  tract.  (B)  Cycling  between  the  regions  of  high  (H) 
and  low  (L)  demand  for  lactose  expression  in  £.  coli.  (C)  Definition  of  average  cycle  time  (C) 
and  average  demand  (D)  for  gene  expression.  (D)  Cycle  time  for  selection.  (E)  rate  of 
selection  and  (F)  extent  of  selection,  as  a  function  of  demand.  See  the  text  for  discussion.  IE 
■ndicates  exponential,  i.e.  IE  +  2=  I0;:  IE  —  7  =  I0"7;  !E+0=  10° etc. 
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though  these  variations  were  not  understood 
initially,  there  was  the  prospect  that  a  simple 
rule  could  be  found,  if  indeed  there  was  one, 
because  of  the  limited  number  of  variations  in 
design  that  had  to  be  analysed.  Secondly,  by  an 
appropriate  choice  of  organizational  level  and 
type  of  representation,  one  could  obtain  simple 
equations  whose  structure  was  amenable  to 
qualitative  analysis  (and  to  exhaustive  numerical 
analysis  when  necessary),  and  this  leads  to 
general  results  that  are  independent  of  specific 
values  for  the  parameters.  This  is  important, 
because  many  of  the  parameter  values  for  any 
system  will  always  be  unknown.  The  success  of 
Biochemical  Systems  Theory  in  elucidating 
general  rules  that  are  consistent  with  patterns 
found  in  nature  is  indicative  of  the  power  of  this 
approach. 

It  is  clear  that  the  elucidation  of  design 
principles  and  the  compilation  of  molecular 
detail  lead  to  very'  different  kinds  of  understand¬ 
ing.  This  is  seen  in  the  examples  considered  in 
the  previous  sections.  Numerous  experimental 
studies  of  gene  circuitiy  over  the  past  30  years 
have  produced  an  abundance  of  molecular 
descriptions  and  have  documented  the  existence 
of  each  of  the  design  features  considered  here. 
However,  these  experimental  results  provided  no 
insight  into  the  underlying  principles  that  govern 
these  designs.  A  focus  on  the  kind  of  under¬ 
standing  that  emerges  from  knowledge  of  the 
underlying  design  principles  will  become 
increasingly  important  in  biochemistry,  not  only 
for  advancing  our  research  programmes,  but  also, 
and  perhaps  more  importantly,  for  instructing 
the  next  generation  of  students.  The  current 
approach  is  overwhelming  students  with  enor¬ 
mous  amounts  of  information  to  memorize  and 
providing  less  and  less  motivation  for  them  to  do 
so.  A  focus  on  design  principles  would  provide 
deeper  understanding,  diminish  the  burden  of 
memorization,  and  integrate  their  understanding 
into  a  broader  and  more  meaningful  context. 
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Abstract 

The  physical  basis  for  biological  complexity  is  context-dependent  expression  of  the  or¬ 
ganism’s  genome.  The  context  is  provided  by  the  life  cycle  of  the  organism;  the  molecular 
mechanisms  of  gene  regulation  interpret  that  context.  The  relationship  between  these  two 
different  hierarchical  levels  of  organization  -  genotype  and  phenotype  -  has  traditionally 
been  approached  from  the  bottom-up  perspective.  Regulation  of  many  gene  systems  has 
been  studied  in  detail,  and  the  results  have  revealed  an  enormous  diversity  of  molecular 
elements  and  circuits.  We  are  just  beginning  to  understand  the  functional  implications  of 
such  variations  in  design  and  to  grasp  the  factors  that  have  influenced  their  evolution.  The 
relationship  between  genotype  and  phenotype  can  now  be  approached  from  the  top-down 
perspective  as  well.  The  new  technologies  that  have  grown  out  of  the  Human  Genome 
Project  have  introduced  a  radically  different  approach  based  on  global  measurements  of 
the  organism’s  phenotype.  However,  there  are  theoretical  limits  regarding  the  extent  to 
which  knowledge  of  the  underlying  mechanisms  can  be  determined  solely  on  the  basis  of 
systemic  measurements.  Success  in  relating  genotype  to  phenotype  will  ultimately  require 
a  combination  of  both  the  top-down  and  bottom-up  approaches.  It  also  will  require  an 
appropriate  systems  theory  for  relating  the  information  at  these  two  levels  of  organization. 
Without  a  quantitative  systems  theory  to  relate  the  information  at  these  different  levels 
of  organization  our  understanding  will  remain  descriptive  and  lack  predictive  value.  I  will 
describe  recent  work  on  a  quantitative  theory  that  relates  molecular  mechanisms  of  gene 
control  to  the  organism’s  physiological  behavior  in  its  natural  environment.  When  applied 
to  the  lactose  operon  of  Escherichia  coli  in  the  human  intestine,  the  theory  predicts  selec¬ 
tion  for  the  correct  mode  of  gene  control.  It  also  makes  surprising  predictions  concerning 
the  organism’s  phenotype  and  habitat. 


Introduction 

What  is  the  function  of  regulatory  gene  circuitry?  The  superficial  answer  is  fairly  obvi¬ 
ous.  The  genotype  is  determined  by  the  information  encoded  in  the  DNA  sequence,  the 
phenotype  is  determined  by  the  context-dependent  expression  of  the  genome,  and  the  reg¬ 
ulatory  circuitry  interprets  the  context  and  orchestrates  the  expression.  However,  a  deeper 
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look  reveals  a  hierarchy  of  mechanisms  linking  the  genotype  to  the  phenotype.  At  each 
level  of  this  hierarchy  there  are  various  designs  that  are  poorly  understood.  While  some 
of  the  differences  may  be  attributed  to  historical  accidents  with  no  attendant  functional 
implications,  others  are  governed  by  rules  that  result  from  natural  selection  [1]. 

The  current  goal  of  functional  genomics  is  identifying  the  function  of  a  gene  product 
that  is  encoded  in  a  particular  string  of  nucleotides.  Given  a  sequence  can  one  predict  that 
it  codes  for  a  dehydrogenase?  From  the  particular  combination  of  domains  can  one  further 
deduce  that  it  is  in  fact  an  homoserine  dehydrogenase?  These  goals  for  relating  genotype 
to  phenotype,  though  still  fairly  modest,  have  yet  to  be  fully  realized.  Nevertheless,  it 
is  instructive  to  look  beyond  the  current  status  and  examine  the  prospects  of  achieving 
the  ultimate  goal  of  functional  genomics,  which  is  to  relate  the  nucleotide  sequence  of  the 
genome  to  the  expression  of  function  in  time  and  space  given  an  appropriate  context. 

In  this  paper  I  will  briefly  review  various  hierarchical  levels  of  organization  from  DNA 
sequence  to  environmental  context,  summarize  results  from  three  different  analyses  that 
connect  the  several  levels  of  organization,  and  then  show  how  these  results  provide  a  self- 
consistent  relationship  between  genotype  and  phenotype  of  the  organism  in  its  natural 
environment  in  the  case  of  a  simple  well-studied  system,  the  lactose  operon  of  Escherichia 
coli.  The  results  in  this  case  suggest  that  reaching  the  ultimate  goal  of  functional  ge¬ 
nomics  will  involve  more  than  recognizing  complex  patterns  in  the  DNA  sequence.  It  will 
require  methods  to  elucidate  and  to  quantify  the  complex  web  of  interactions  that  link  the 
numerous  hierarchical  levels  of  organization  from  DNA  sequence  to  integrated  behavior 
of  the  intact  organism  in  its  environment.  Understanding  the  evolution  and  regulation  of 
gene  circuitry  is  at  the  heart  of  the  matter. 


Hierarchies  from  Genotype  to  Phenotype 

The  behavior  of  an  intact  biological  system  can  seldom  be  related  directly  to  its  underlying 
molecular  determinants.  There  are  several  different  levels  of  hierarchical  organization 
that  intervene  -  genome  sequence,  transcriptional  unit,  mode  of  control,  logic  of  control, 
expression  cascade,  connectivity,  and  environmental  context. 

The  genome  sequence  consists  of  the  four  bases  -  A,  T,  G,  and  C  —  strung  together  in 
seemingly  endless  variation.  However,  genetic  analysis  has  shown  that  a  basic  grammar 
defines  how  this  alphabet  is  organized  into  meaningful  units  of  transcription.  These  units 
consist  of  structural  genes  bounded  by  initiation  and  termination  sites  with  a  number  of 
adjacent  regulatory  sites  capable  of  binding  specific  transcription  factors  that  modulate 
the  rate  of  transcription  initiation  or  termination. 

The  molecular  mode  of  control  by  which  individual  regulator  proteins  affect  the  tran¬ 
scriptional  process  can  be  either  positive  or  negative.  For  example,  induction  of  gene 
expression  can  be  achieved  either  by  supplying  the  activator  of  a  quiescent  process  or  by 
removing  the  repressor  of  a  constitutively  active  process.  A  particular  set  of  such  regula¬ 
tors,  each  with  their  own  mode  of  action,  can  act  in  a  combinatorial  fashion  on  a  single 
transcriptional  unit  to  achieve  control  according  to  a  specific  logical  function. 

The  transcription  of  a  gene  is  only  the  first  of  many  steps  in  a  cascade  of  information 
flow  from  DNA  to  RNA  to  protein  to  metabolite  that  constitutes  expression  of  a  gene. 
These  expression  cascades  are  interconnected  because  the  products  of  one  cascade  act  as 
regulators  of  other  cascades.  If  fact,  it  is  the  topological  connectivity  of  the  elementary 
expression  cascades  that  constitutes  the  regulatory  gene  circuitry  of  the  cell. 

The  environment  provides  the  context  that  must  be  interpreted  by  the  circuitry  m 
order  to  produce  a  pattern  of  gene  expression  conducive  to  the  organism's  survival.  These 
several  levels  of  organization  have  been  abstracted  from  what  is  known  of  gene  expression 
in  both  prokaryotes  and  eukaryotes,  but  for  testing  our  predictions  we  shall  henceforth 
focus  our  attention  on  the  lac  operon  of  E.  coli. 
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Lac  Expression  and  Underlying  Circuitry 

It  has  long  been  known  that  expression  of  the  lac  operon  can  be  induced  by  growth  on 
lactose  as  a  carbon  and  energy  source  [2].  However,  the  elucidation  of  the  underlying 
circuitry  involved  the  experimental  exploitation  of  mutants  and  gratuitous  inducers  that 
allowed  specific  aspects  of  the  regulatory  mechanisms  to  be  identified  and  characterized 
[3].  For  example,  the  first  evidence  of  cooperativity  in  the  action  of  lac  repressor  was 
uncovered  through  the  use  of  a  non-metabolizable  inducer  and  a  transport  mutant  that 
together  allowed  the  intracellular  concentration  of  inducer  to  be  specified  experimentally 
via  the  extracellular  environment  [4]. 

Induction  of  the  wild-type  operon  by  gratuitous  inducers  provided  evidence  for  a  dy¬ 
namic  switch  that  exhibits  hysteresis  as  shown  in  Figure  IB.  When  there  is  a  low  level 
of  expression,  substrate  concentration  must  be  increased  above  level  a  before  there  is  an 
abrupt  switch  to  a  high  level  of  expression;  when  there  is  a  high  level  of  expression,  sub¬ 
strate  concentration  must  be  decreased  below  level  b  before  there  is  an  abrupt  switch  to  a 
low  level  of  expression.  Thus,  at  intermediate  concentrations  of  substrate,  such  as  level  c, 
the  level  of  expression  can  be  either  high  or  low,  depending  upon  the  past  history  of  the 
substrate  concentration.  The  original  explanation  for  this  behavior  focused  on  the  positive 
feedback  resulting  from  induction  of  the  lac-encoded  transport  system  [5].  This  continued 
to  be  the  accepted  explanation  even  as  the  details  of  lac  circuitry  were  characterized  and 
a  kinetic  model  developed. 


A 


Figure  1:  Alternative  to  the  current  kinetic  model  of  the  lac  operon.  (A)  Schematic 
diagram  showing  the  extra  fate  for  the  inducer  allolactose.  (B)  Steady-state  induction 
characteristic  of  the  model  in  panel  A  exhibits  hysteresis. 

Recent  analysis  has  demonstrated  that  the  current  kinetic  model  of  the  wild-type 
operon  is  capable  of  the  dynamic  switch  that  generates  hysteresis  in  response  to  gratu¬ 
itous  inducers  but  not  in  response  to  the  natural  substrate  lactose  [6].  It  is  experimentally 
more  difficult  to  obtain  the  steady-state  induction  characteristic  with  the  natural  sub¬ 
strate,  so  it  is  perhaps  not  surprising  that  such  experiments  axe  difficult  to  find  in  the 
literature.  Special  care  must  be  taken  to  design  experiments  that  are  sufficiently  long¬ 
term  to  insure  a  steady  state  and  with  sufficiently  low  cell  densities  to  ensure  negligible 
consumption  of  substrate.  Carefully  executed  experiments  of  this  kind  have  demonstrated 
an  hysteretic  response  to  lactose  [7].  These  two  facts  -  experimental  demonstration  of 
hysteresis  and  mathematical  demonstration  that  the  current  model  is  incapable  of  such 
behavior  -  indicates  that  the  current  model  of  the  lac  operon  is  inadequate. 

We  have  proposed  an  alternative  model  (Figure  1A)  in  which  the  inducer  has  an 
additional  fate,  not  inducible  by  allolactose,  that  accounts  for  the  dynamic  switch  in 
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expression  [6]  when  the  additional  flux  is  greater  than  a  well-defined  minimum  [8].  We 
also  have  found  independent  experimental  evidence  in  the  literature  that  lends  support  to 
this  model.  The  meaning  of  these  results  will  become  clear  later  in  the  context  of  results 
from  other  analyses. 


Molecular  Mode  of  Control  and  Demand  for  Lac  Gene  Ex¬ 
pression 

The  conclusions  from  the  analysis  described  in  the  previous  section  link  knowledge  of  the 
organism’s  physiology  to  knowledge  of  the  underling  gene  circuitry.  The  analysis  in  this 
section  will  link  knowledge  of  the  organism’s  genome  and  molecular  mode  of  control  to 
knowledge  of  the  demand  for  gene  expression  during  the  organism’s  life  cycle. 

The  life  cycle  of  E.  coli  involves  sequential  colonization  of  new  host  organisms  [9], 
which  means  repeated  cycling  between  two  different  environments  (Figure  2 A  and  2B). 
In  one,  the  upper  portion  of  the  host’s  intestinal  track,  the  microbe  is  exposed  to  the 
substrate  lactose  and  there  is  a  high  demand  for  expression  of  the  lac  operon,  and  in  the 
other,  the  lower  portion  of  the  intestinal  track  and  the  environment  external  to  the  host, 
the  microbe  is  not  exposed  to  lactose  and  there  is  a  low  demand  for  lac  expression.  The 
average  time  to  complete  a  cycle  through  these  two  environments  is  defined  as  the  cycle 
time,  C,  and  the  average  fraction  of  the  cycle  time  spent  in  the  high-demand  environment 
is  defined  as  the  demand  for  gene  expression,  D  (Figure  2C). 

The  implications  for  gene  expression  of  mutant  and  wild-type  operons  in  the  high- 
and  low-demand  environments  are  as  follows.  The  wild-type  functions  by  turning  on 
expression  in  the  high-demand  environment  and  turning  off  expression  in  the  low  demand 
environment.  The  mutant  with  a  defective  promoter  is  unable  to  turn  on  expression  in 
either  environment.  The  mutant  with  a  defective  modulator  (which  can  also  stand  in  for  a 
mutant  with  a  defective  regulator  protein)  is  unable  to  turn  off  expression  regardless  of  the 
environment.  The  double  mutant  with  defects  in  both  promoter  and  modulator  behaves 
like  the  promoter  mutant  and  is  unable  to  turn  on  expression  in  either  environment. 

The  mutation  rates  between  these  populations  depend  on  the  mutation  rate  per  base 
and  on  the  size  of  the  relevant  target  sequence.  The  population  sizes  also  are  dependent 
upon  selection. 

There  is  selection  against  mutants  of  the  modulator  type  in  the  low-demand  environ¬ 
ment  because  there  is  superfluous  expression  of  an  unneeded  function.  There  is  selection 
against  mutants  of  the  promoter  type  in  the  high-demand  environment  because  expression 
of  a  needed  function  is  lacking. 

Solution  of  the  dynamic  equations  for  each  of  the  populations  cycling  through  the  two 
environments  yields  equations  in  C  and  D  for  the  threshold,  extent,  and  rate  of  selection 
for  the  wild-type  control  mechanism  [10].  The  threshold  for  selection  is  shown  by  the 
shaded  region  in  Figure  2D;  only  systems  with  values  of  C  and  D  that  fall  within  this 
region  are  capable  of  being  selected.  The  rate  and  extent  of  selection  shown  in  Figure  2E 
and  2F  exhibit  optimum  values  for  a  specific  value  of  D. 

Application  of  this  theory  to  the  lac  operon  of  E.  coli  yields  several  new  and  provocative 
predictions  that  relate  genotype  to  phenotype  [11]. 

The  straight  line  in  Figure  2D  represents  the  inverse  relationship  C=3D3/D  that  results 
from  fixing  the  time  of  exposure  to  lactose  at  3  hours,  which  is  the  clinically  determined 
value  for  humans  [12, 13].  The  intersections  of  this  line  with  the  two  thresholds  for  selection 
provide  lower  and  upper  bounds  on  the  cycle  tune.  The  lower  bound  is  approximately  24 
hours,  which  is  about  as  fast  as  the  microbe  can  cycle  through  the  intestinal  track  without 
colonization  [14,  15, 16].  The  upper  bound  is  approximately  70  years,  which  is  the  longest 
period  of  colonization  without  cycling  and  corresponds  favorably  with  the  maximum  life 
span  of  the  host  [17].  The  optimum  value  for  the  cycle  time,  corresponding  to  the  optimum 
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Figure  2:  Life  cycle  and  demand  for  gene  expression.  (A)  Schematic  diagram  of  the 
upper  (high  demand)  and  lower  (low  demand)  portions  of  the  human  intestinal  track.  (B) 
Life  cycle  consists  of  repeated  cycling  between  high-  and  low-demand  environments.  (C) 
Definition  of  cycle  time  C  and  demand  for  gene  expression  D.  (D)  Region  in  the  C  vs. 
D  plot  for  which  selection  of  the  wild-type  control  mechanism  is  possible.  (E)  Rate  of 
selection  as  a  function  of  demand.  (F)  Extent  of  selection  as  a  function  of  demand.  See 
text  for  discussion. 
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value  for  demand  (from  Figures  2E  and  2F),  is  approximately  four  months;  this  value  is 
comparable  with  the  average  rate  of  recolonization  measured  in  humans  [18,  19,  20]. 

Logic  and  Phasing  of  Lac  Control 

The  analysis  in  the  previous  section  assumed  that  when  E.  coli  was  growing  on  lactose 
there  was  no  other  more  preferred  carbon  source  present.  Thus,  the  positive  CAP-cAMP 
regulator  [21]  was  always  present,  and  we  could  then  concentrate  on  the  conditions  for 
selection  of  the  specific  control  by  Lac  repressor.  This  was  a  simplifying  assumption;  in 
the  general  situation,  both  the  specific  control  by  Lac  repressor  and  the  global  control 
by  CAP-cAMP  activator  must  be  taken  into  consideration.  The  analysis  becomes  more 
complex,  but  it  follows  closely  the  outline  of  the  simpler  case  in  the  previous  section. 

By  extension  of  the  definition  for  demand  D,  given  in  the  pervious  section,  one  can 
define  a  period  of  demand  for  the  absence  of  repressor  G,  a  period  of  demand  for  the 
presence  of  activator  E,  and  a  phase  relationship  between  these  two  periods  of  demand  F. 
By  extension  of  the  analysis  in  the  previous  section,  solution  of  the  dynamic  equations  for 
the  wild-type  and  each  of  the  mutant  populations  cycling  through  the  two  environments 
yields  equations  in  C,  G,  E,  and  F  for  the  threshold,  extent,  and  rate  of  selection  for  the 

wild-type  control  mechanism  [22].  . 

The  threshold  for  selection  is  now  an  envelope  surrounding  a  mound  in  four-  di¬ 
mensional  space  with  cycle  time  C  as  a  function  of  the  three  parameters  G,  E,  and  F; 
only  systems  with  values  that  fall  within  this  envelope  are  capable  of  being  selected.  The 
rate  and  extent  of  selection  exhibit  optimum  values  as  before,  but  these  now  occur  with  a 
specific  combination  of  values  for  G,  E,  and  F.  The  values  of  G,  E,  and  F  that  yield  the 
optima  represent  a  small  period  when  repressor  is  absent,  an  even  smaller  period  when 
activator  is  present,  and  a  large  phase  period  between  them.  The  period  when  repres- 
sor  is  absent  corresponds  to  the  period  of  exposure  to  lactose.  Within  this  period  there 
is  a  shorter  period  when  activator  is  absent;  this  corresponds  to  the  presence  of  a  more 
preferred  carbon  source  that  lowers  the  level  of  cAMP. 

These  relationships  can  be  interpreted  in  terms  of  exposure  to  lactose,  exposure  to 
glucose,  and  expression  of  the  lac  operon  as  shown  in  Figure  3.  The  initial  exposure  to 
lactose  leads  to  an  accumulation  of  the  natural  inducer  allolactose  and  hence  to  induced 
expression  of  /3-galactosidase.  The  result  is  a  greatly  increased  synthesis  of  the  products 
allolactose,  glucose  and  galactose.  These  products  are  preferentially  excreted  back  into  the 
extracellular  environment,  in  agreement  with  our  proposed  model  for  hysteretic  expres¬ 
sion  of  the  lac  operon.  The  extracellular  glucose  causes  catabolite  repression  and  lactose 
exclusion,  thereby  initiating  a  period  of  growth  on  glucose.  During  this  period  the  activa¬ 
tor  CAP-cAMP  is  absent,  transcription  of  the  lac  operon  ceases  and  the  concentration  of 
/3-galactosidase  is  diluted  by  growth,  and  lactose  is  spared.  Eventually,  glucose  becomes 
depleted,  the  residual  lactose  causes  a  diminished  secondary  induction  of  /?-galactosidase, 
and  the  microbe  enters  the  low-demand  environment  as  the  lactose  is  exhausted. 


Discussion 

The  results  of  the  three  different  analyses  described  in  the  preceding  sections  are  remark¬ 
ably  self-consistent  and  supported  by  a  diverse  set  of  independent  experimental  observa- 

First,  the  analysis  of  lac  circuitry  showed  that  the  conventional  kinetic  model  is  inad¬ 
equate  in  that  it  is  incapable  of  producing  all-or-none  expression  of  the  lac  operon.  The 
new  model  we  have  proposed  includes  a  non-inducible  pathway  for  removal  of  inducer. 
There  is  indeed  independent  experimental  evidence  to  support  this  model  [23].  The  nat¬ 
ural  inducer  allolactose  as  well  as  the  products  of  its  hydrolysis  are  rapidly  excreted  from 
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the  cell.  Moreover,  studies  using  mutants  have  shown  that  internally  generated  glucose 
is  incapable  of  being  used  efficiently  by  the  cell,  whereas  the  excreted  glucose,  which  is 
phosphorylated  as  it  is  transported  back  into  the  cell,  is  used  with  high  efficiency.  Prom 
these  results  one  can  see  that  the  pathway  for  utilization  of  lactose  involves  induction  of 
transport  and  catabolism  of  lactose,  efflux  of  the  glucose  and  galactose  produced,  trans¬ 
port  of  the  external  glucose  and  galactose  back  into  the  cell,  and  finally  entry  into  the 
pathways  of  intermediary  metabolism.  The  catabolite  repression  caused  by  glucose  acts 
as  a  negative  feedback  mechanism  to  moderate  the  overall  rate  of  lactose  utilization. 

Second,  the  quantitative  version  of  demand  theory  integrates  information  at  the  level 
of  DNA  (mutation  rate,  effective  target  sizes  for  mutation  of  regulatory  proteins,  promoter 
sites,  and  modulator  sites),  physiology  (selection  coefficients  for  superfluous  expression  of 
an  unneeded  function  and  for  lack  of  expression  of  an  essential  function),  and  ecology 
(environmental  context  and  life  cycle)  and  makes  rather  surprising  predictions  connected 
to  the  intestinal  physiology  and  life  span  of  the  host  and  to  the  rate  for  recolonizing  the 
host.  There  is  independent  experimental  data  to  support  each  of  these  predictions. 

Finally,  when  the  logic  of  combined  control  by  CAP-cAMP  activator  and  Lac  repressor 
was  analyzed,  we  found  an  optimum  set  of  values  not  only  for  the  exposure  to  lactose, 
but  also  for  the  exposure  to  glucose  and  for  the  relative  phasing  between  the  periods  of 
exposure.  The  phasing  predicted  is  consistent  with  a  self-generated  glucose  effect  produced 
by  catabolism  of  lactose  and  excretion  of  glucose.  These  results  are  supported  by  the 
same  experimental  data  noted  above  in  connection  with  the  hysteretic  expression  of  the 
lac  operon. 

The  results  from  all  three  analyses  fit  together  nicely.  In  the  end,  we  are  able  to  relate 
information  in  the  nucleotide  sequence  of  the  lac  operon  to  its  specific  pattern  of  expression 
in  time  and  space.  In  the  process  we  have  made  use  of  information  at  several  other  levels 
of  organization,  including  important  information  about  the  host  that  provides  the  ecolog¬ 
ical  niche  for  the  microbe.  While  much  of  the  necessary  information  could  in  principle 
be  deduced  from  the  underlying  sequence,  some  would  still  require  a  systemic  integration 
at  the  level  of  the  intact  organism  and  its  environment.  FVom  this  perspective  it  is  clear 
that  a  completely  reductionist  deduction  of  function  solely  from  information  in  the  DNA 
sequence  will  be  unattainable  in  most  cases.  Aside  from  those  few  organisms  that  have 
a  relatively  self-contained  developmental  program,  functional  genomics  will  ultimately  be 
concerned  with  the  genomes  of  multiple  organisms  undergoing  mutual  interaction  and  co¬ 
evolution. 
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Figure  3:  Schematic  interpretation  of  optimal  phasing  of  /?-galactosidase  expression  in 
terms  of  exposure  to  lactose  and  glucose.  See  text  for  discussion. 
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Abstract 

Motivation:  When  dealing  with  questions  that  concern 
a  general  class  of  models  for  biological  networks, 
large  numbers  of  distinct  models  within  the  class  can 
be  grouped  into  an  ensemble  that  gives  a  statistical 
view  of  the  properties  for  the  general  class.  Comparing 
properties  of  different  ensembles  through  the  use  of  point 
measures  (e.g.  medians ,  standard  deviations,  correlation 
coefficients)  can  mask  inhomogeneities  in  the  correlations 
between  properties.  We  are  therefore  motivated  to  develop 
strategies  that  allow  these  inhomogeneities  to  be  more 
easily  detected. 

Results:  Methods  are  described  for  constructing  ensem¬ 
bles  of  models  within  the  context  of  a  Mathematically  Con¬ 
trolled  Comparison.  A  Density  of  Ratios  Plot  for  a  given 
systemic  property  is  then  defined  as  follows:  the  y  axis  rep¬ 
resents  the  value  of  the  systemic  property  in  a  reference 
model  divided  by  the  value  in  the  alternative  model,  and 
the  x  axis  represents  the  value  of  the  systemic  property  in 
the  reference  model  Techniques  involving  moving  quan¬ 
tiles  are  introduced  to  generate  secondary  plots  in  which 
correlations  and  inhomogeneities  in  correlations  are  more 
easily  detected.  Several  examples  that  illustrate  the  advan¬ 
tages  of  these  techniques  are  presented  and  discussed. 
Contact:  Savageau@umich.edu 

Introduction 

The  only  rigorous  way  to  characterize  and  compare 
alternative  biological  designs  for  a  particular  class  of 
systems  is  through  the  use  of  mathematical  models  and 
quantitative  methods  of  analysis.  In  pursuing  these  goals 
we  must  address  three  critical  issues.  First,  biologically 
meaningful  behaviors  must  be  identifed  (or,  as  is  more 

*To  whom  correspondence  should  be  addressed. 


commonly  the  case,  hypothesized)  and  characterized  by 
quantitative  measures.  Second,  a  representation  of  the 
alternatives  must  be  capable  of  describing  the  phenomena 
of  interest  in  quantitative  terms.  Third,  comparisons  will 
require  analyses  that  explore  a  range  of  parameter  values 
and  use  statistical  methods  to  evaluate  the  results. 

The  first  issue  is  obviously  critical  if  the  results  are  to  be 
biologically  significant;  however,  there  is  no  prescription 
for  discovering  those  biological  behaviors  that  are  based 
on  natural  selection  or  those  that  occur  at  random  with 
high  probability.  The  behaviors  that  are  important  charac¬ 
teristics  of  a  given  biological  system  can  only  be  discov¬ 
ered  by  experimental  means.  Hypotheses  must  be  gener¬ 
ated  and  tested  in  each  case,  and  this  process  will  vary 
considerably  according  to  the  systems  being  studied.  The 
behavioral  repertoire  of  nonlinear  systems  can  be  quite  di¬ 
verse  including  saturation,  thresholds,  memory,  time  de¬ 
lays,  synchrony,  stable  limit  cycles  and  strange  attractors. 

The  second  issue  is  critical  to  any  quantitative  compari¬ 
son  of  alternative  systems.  We  require  a  mathematical  lan¬ 
guage  (or  formalism)  that  is  sufficiently  flexible  to  repre¬ 
sent  the  diverse  behaviors  that  are  likely  to  be  encountered 
in  the  quantitative  description  of  a  nonlinear  biological 
system.  The  power- law  formalism  (Savageau,  1996)  is  a 
most  likely  candidate  for  this  language.  It  can  be  viewed  as 
a  canonical  nonlinear  representation  from  three  different 
perspectives.  From  a  fundamental  perspective,  it  provides 
a  generalization  of  mass-action  kinetics,  which  is  the  most 
widely  used  representation  of  biological  systems  at  the 
molecular  level.  From  a  recasting  perspective,  it  provides 
a  globally  accurate  representation  that  can  be  made  mathe¬ 
matically  equivalent  to  any  sufficiently  differentiable  non¬ 
linear  system.  From  a  local  perspective,  it  provides  a  gen¬ 
eral  representation  that  is  guaranteed  to  be  accurate  over  a 
range  of  variation  about  a  nominal  operating  point. 


©  Oxford  University  Press  2000 


527 


R. Alves  and  M.A.Savageau 


The  third  issue  is  critical  because  values  for  many  of 
the  parameters  in  any  given  complex  system  will  not  have 
been  measured,  and  for  those  that  have  the  estimates  will 
often  be  poor.  Moreover,  even  if  we  had  a  complete  set 
of  accurate  parameter  values  with  which  to  study  the 
behavior  of  a  system,  the  results  would  only  apply  to  that 
particular  system.  In  any  case,  we  would  have  to  vary  the 
parameters  over  a  range  of  values  and  statistically  analyze 
the  results  to  determine  the  properties  of  the  general  class 
of  systems  to  which  the  particular  system  belongs. 

Our  purpose  in  this  and  the  following  paper  is  to  present 
a  methodology  for  dealing  with  this  third  issue  and  to  il¬ 
lustrate  its  use  in  the  simplest  setting  where  the  essentials 
of  the  methodology  can  be  made  most  transparent.  Hence, 
we  shall  focus  on  a  class  of  systems  for  which  the  biolog¬ 
ically  relevant  behavior  is  relatively  simple  and  well  de¬ 
fined  (namely,  unbranched  amino-acid  biosynthetic  path¬ 
ways  with  a  single  homeostatic  steady  state)  and  for  which 
the  local  nonlinear  representation,  which  is  the  simplest  of 
the  representations  within  the  power-law  formalism,  is  ap¬ 
propriate.  At  the  end  of  the  second  paper  we  will  return 
to  these  issues  and  indicate  how  the  methods  presented 
here  might  be  applied  to  systems  with  more  complex  be¬ 
haviors  requiring  more  general  representations  within  the 
power-law  formalism.  The  methods  themselves  provide  an 
extension  of  a  previously  developed  approach  for  making 
well-controlled  comparisons. 

In  the  study  of  complex  biological  networks,  models 
with  alternative  designs  or  structure  are  often  compared 
to  determine  which  of  them  provides  the  better  repre¬ 
sentation  for  some  observed  phenomenon  (e.g.  Ni  and 
Savageau,  1996).  When  comparing  structurally  different 
models  for  the  same  phenomenon,  it  is  difficult  to  know 
whether  the  differences  observed  are  accidental  or  in¬ 
herent  differences  that  can  be  attributed  specifically  to 
the  alternative  designs.  The  method  of  Mathematically 
Controlled  Comparison  (Savageau,  1972;  for  a  review  see 
Irvine,  1991)  was  proposed  to  address  this  issue. 

In  brief,  the  steps  involved  in  this  method  are  as  follows. 
First,  mathematical  models  are  formulated  for  the  alterna¬ 
tive  designs  being  compared.  For  example,  a  biosynthetic 
pathway  with  end-product  inhibition  and  an  identical  one 
without  it.  One  model,  generally  the  more  complex,  is  des¬ 
ignated  the  reference ;  the  other  is  designated  the  alterna¬ 
tive.  Second,  the  parameters  of  the  alternative  model  are 
fixed  relative  to  those  of  the  reference  model.  Each  pro¬ 
cess  in  the  alternative  model  that  is  identical  to  one  in  the 
reference  model  is  assigned  a  set  of  parameter  values  that 
is  identical  to  the  corresponding  set  in  the  reference  model. 
This  is  referred  to  as  internal  equivalence.  Each  process  in 
the  alternative  model  that  is  different  from  the  correspond¬ 
ing  process  in  the  reference  model  will  have  a  set  of  pa¬ 
rameter  values  that  is  unique  to  the  alternative  model,  and 
these  parameters  represent  degrees  of  freedom  that  must 


be  constrained  in  an  effort  to  reduce  the  accidental  differ¬ 
ences  between  the  models.  Each  constraint  is  established 
by  equating  the  expressions  for  a  systemic  property  com¬ 
mon  to  the  two  models.  The  set  of  constraint  equations  is 
then  solved  to  determine  values  for  the  unique  parameters 
of  the  alternative  model  in  terms  of  values  for  the  parame¬ 
ters  of  the  reference  model.  This  is  referred  to  as  external 
equivalence .  Finally,  having  eliminated  all  the  degrees  of 
freedom,  the  two  models  are  analyzed  to  determine  the 
differences  that  remain. 

The  critical  step  in  this  method  is  the  solution  of 
the  constraint  equations.  The  models  are  described  by 
nonlinear  equations  that  in  general  have  no  analytical 
solution.  However,  the  discovery  of  a  canonical  nonlinear 
representation  that  is  locally  valid  and  amenable  to 
analytical  solution  (Savageau,  1969a,  1969b;  for  a  review 
see  Savageau,  1996)  removes  the  difficulty  associated  with 
this  critical  step  in  many  cases  (Savageau,  1972,  1976). 
This  canonical  nonlinear  representation  within  the  power- 
law  formalism  is  referred  to  as  an  S-system  and  it  has  the 
following  systematic  structure: 


J= 1  7=1 


i  =  1,2, 


For  each  dependent  concentration  Xj  in  a  biochemical 
model  there  exists  an  aggregate  production  function 
and  an  aggregate  consumption  function.  These  aggregate 
functions  are  approximated  by  a  first-order  Taylor  series 
in  a  logarithmic  space,  which  in  Cartesian  space 
to  the  product  of  power-law  functions.  An  exponent  of 
zero  for  any  Xj  means  that  that  variable  has  no  direct 
influence  on  the  rate  of  the  corresponding  aggregate 
process,  a  positive  exponent  means  that  the  variable  and 
the  rate  of  the  aggregate  process  are  positively  correlated, 
and  a  negative  exponent  means  that  they  are  negatively 
correlated.  In  a  steady  state,  (1)  becomes  a  linear  equation 
in  logarithmic  space  and  can  be  solved  analytically. 
Likewise,  various  systemic  properties  can  be  calculated 
analytically  and  used  to  form  constraints  by  equating 
the  analytical  expressions  for  corresponding  systemic 
properties  in  the  two  models.  These  constraint  equations 
can  then  be  solved  to  determine  values  for  the  unique 
parameters  of  the  alternative  model  in  terms  of  values  for 
the  parameters  of  the  reference  model. 

Once  internal  and  external  equivalence  between  the 
models  is  established  in  this  manner,  we  can  proceed  to 
analyze  the  models  and  compare  their  systemic  behaviors 
by  taking  ratios  of  their  corresponding  properties.  The 
steady-state  properties  that  are  typically  analyzed  in 
Mathematically  Controlled  Comparisons  include  concen¬ 
trations,  fluxes,  logarithmic  gains,  parameter  sensitivities, 
and  stability  margins.  For  the  purposes  of  this  paper,  these 


« 
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systemic  properties  will  be  represented  by  M.  The  ratio 
of  M  in  the  reference  model  to  M  in  the  alternative  model 
exhibits  one  of  three  possible  properties. 

1 .  The  analytical  ratio  of  M  values  is  always  equal  to  1 , 
independent  of  parameter  values.  This  means  that 
the  property  being  analyzed  is  always  the  same  in 
the  two  models. 

2.  The  analytical  ratio  of  M  values  is  always  larger 
(smaller)  than  1,  independent  of  parameter  values. 
This  means  that  the  property  being  analyzed  is 
always  larger  (smaller)  in  the  reference  model  than 
in  the  alternative  model.  However,  if  the  numerical 
values  for  the  parameters  are  not  known  we  can  not 
say  how  much  larger  (smaller)  the  property  is. 

3.  The  analytical  ratio  of  M  values  is  larger  or  smaller 
than  1,  depending  on  the  parameter  values.  In  this 
case  it  is  difficult  to  say  anything  about  the  property 
by  simple  examination  of  the  analytical  ratio. 

The  uncertainties  associated  with  properties  2  and  3  will 
be  addressed  by  the  numerical  methods  being  proposed  in 
this  paper.  Moreover,  these  methods  will  allow  us  to  draw 
statistical  conclusions  about  the  relative  merits  of  various 
biological  designs. 

Methods 

If  we  knew  the  numerical  values  for  all  the  parameters  of 
the  reference  model,  then  we  could  calculate  the  numerical 
values  for  all  the  parameters  of  the  alternative  model  that  is 
internally  and  externally  equivalent.  However,  knowledge 
of  all  the  parameter  values  is  rarely  available  for  any 
model.  Furthermore,  using  just  one  set  of  parameter 
values  restricts  the  interpretation  to  the  specific  pair 
of  models  being  compared.  These  limitations  can  be 
overcome  by  creating  a  large  ensemble  of  reference 
models  with  randomly  generated  sets  of  parameter  values 
that  adequately  sample  the  parameter  space.  For  each  of 
these  one  can  then  construct  the  alternative  model  that  is 
internally  and  externally  equivalent. 

There  are  two  types  of  parameters  that  appear  in 
the  S-system  representation  (equation  (1)):  exponential 
parameters  (kinetic  orders)  and  multiplicative  parameters 
(rate  constants).  The  exponential  parameters,  which  are 
weighted  averages  of  more  elementary  kinetic  orders, 
typically  have  values  less  than  4  in  magnitude  (Voit  and 
Savageau,  1987).  The  multiplicative  parameters,  which 
reflect  the  different  time  scales  present  within  the  model, 
for  most  cases  of  interest  are  within  4  orders  of  magnitude 
of  each  other  (i.e.  within  4  log10  units).  The  results  given 
in  the  following  section  are  not  critically  dependent  upon 
this  particular  choice  of  limits  for  the  parameter  space  that 
needs  to  be  sampled. 


By  using  randomly  generated  numbers  we  can  sample 
the  relevant  parameter  space,  apply  selection  and  create  a 
large  ensemble  of  biologically  relevant  numerical  models 
for  both  the  reference  and  alternative  designs,  and  make 
an  ensemble  of  numerical  comparisons.  The  amount  of 
data  generated  by  this  approach  can  be  overwhelming. 
The  following  subsections  describe  several  ways  to  treat 
and  interpret  these  data.  In  a  following  paper  (Alves 
and  Savageau,  2000)  these  methods  are  applied  to  a 
specific  class  of  biochemical  control  mechanisms  in  a 
context  different  from  that  of  mathematically  controlled 
comparisons.  Subsequent  papers  will  provide  examples  of 
specific  applications  within  the  mathematically  controlled 
comparison  framework. 

Basic  treatment  and  analysis  of  the  comparisons 

The  first  problem  in  analyzing  a  large  number  of  compar¬ 
isons  is  deciding  how  to  represent  the  data.  Since  we  are 
comparing  the  value  of  a  given  property  M  between  the 
reference  model  and  its  alternative,  one  obvious  way  to 
represent  the  data  is  by  taking  the  ratio  of  M  in  the  refer¬ 
ence  model  to  M  in  the  alternative  model. 

R  =  Mrcfereace/Af alternative  •  (2) 

When  dealing  with  an  ensemble  of  comparisons  we  must 
calculate  the  ratio,  R,  of  M  values  for  each  reference 
model  and  its  alternative  model  that  is  internally  and 
externally  equivalent.  These  data  then  can  be  treated  by 
calculating  some  quantile  of  interest  for  the  ensemble  of 
ratios,  thus  determining  whether  M  is  statistically  laiger  in 
the  reference  models  or  their  alternatives.  This,  however, 
will  not  give  us  much  information,  even  if  we  included 
calculations  for  the  dispersion  of  the  results. 

Density  plots 

More  information  can  be  obtained  from  density  plots  of 
R  versus  M,  where  M  is  a  property  measured  in  the 
reference  model;  e.g.  the  sensitivity,  S(X,*,  oq),  of  a  given 
intermediate,  X,  ,  to  fluctuations  in  the  rate  constant,  oq, 
for  the  first  reaction  of  the  pathway.  Some  density  plots 
where  the  ratio  is  typically  smaller  than  1  are  presented  in 
Figures  1-3.  Note  that  in  Figure  1A  we  have  a  situation 
in  which  the  ratio  of  S(Xj,a i)  is  uniformly  scattered 
throughout  the  entire  region  bounded  by  R  =  1  and 
R  =  0.  Figures  2A  and  3A  show  different  non-traditional 
distributions.  Figure  3A  shows  a  case  in  which  M  can  take 
only  discrete  values. 

Density  plots  can  be  used  to  determine  rank  correla¬ 
tions  between  M  and  R.  Traditionally  we  calculate  non- 
parametric  rank  correlations  by  using  point  measures  such 
as  the  Spearman  or  Kendal  rank  correlation  coefficients 
(e.g.  Wherry,  1984;  Krauth,  1988).  These  methods  find 
linear  and  non-linear  rank  correlations  between  variables; 
however,  it  is  not  always  easy  to  find  such  correlations  in 
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Fig.  1.  Uncorrelated  Density  of  Ratios  Plots.  A:  Density  Plot  of  R 
versus  M  for  two  alternative  models.  There  is  a  uniform  distribution 
of  values  on  both  axies.  B:  Moving  median  plot  of  {/?)  versus  ( AT ) 
for  the  data  in  panel  A  and  a  window  size  of  W  =  50.  C:  Moving 
median  plot  of  (R)  versus  (M)  for  the  data  in  panel  A  and  a  window 
size  of  W  =  500.  See  text  for  discussion. 


Fig.  2.  Correlated  Density  of  Ratios  Plots.  A:  Density  Plot  of  R 
versus  M  for  two  alternative  models.  B:  Moving  median  plot  of  {/?) 
versus  {M )  for  the  data  in  panel  A  and  a  window  size  of  W  =  50. 
C:  Moving  median  plot  of  {/?)  versus  (M)  for  the  data  in  panel  A 
and  a  window  size  of  W  =  500.  See  text  for  discussion. 
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Fig.  3.  Correlated  Discrete  Density  of  Ratios  Plots.  A:  Density  Plot 
of  R  versus  M  for  two  alternative  models  for  which  M  and  R 
assume  discrete  values.  B:  Moving  median  plot  of  </?)  versus  (M ) 
for  the  data  in  panel  A  and  a  window  size  of  W  —  50.  C:  Moving 
median  plot  of  {R)  versus  (Af)  for  the  data  in  panel  A  and  a  window 
size  of  W  =  500.  See  text  for  discussion. 


density  plots  that  are  as  scattered  as  the  ones  presented  in 
Figures  1 A  or  2A. 

The  analysis  of  these  density  plots  using  point  measures 
can  be  almost  as  cumbersome  to  interpret  as  the  results 
from  purely  symbolic  analysis.  Furthermore,  the  point 
measures  will  almost  certainly  hide  information  that 
would  be  available  from  a  less  coarse  analysis.  The 
frequency  of  different  values  in  a  density  plot  is  typically 
analyzed  using  two-  and  three-dimensional  histograms. 
However,  this  approach  may  or  may  not  lead  to  the 
determination  of  standard  statistical  distributions  that  fit 
the  pattern  of  the  data. 

Quantile  analysis  of  density  plots 
Moving  quantile  techniques  allow  us  to  interpret  density 
plots  using  either  parametric  or  non-parametric  statistics. 
However,  we  shall  refer  only  to  the  non-parametric  case  in 
the  remainder  of  this  paper.  Let  us  assume  that  we  want 
to  know  whether  the  M  values  of  the  reference  model  are 
larger  than  those  of  the  alternative  model  (i.e.  R  >  1 )  more 
often  than  not.  This  can  be  determined  from  the  median  of 
the  ratios,  i.e.  Quantile  0.5  (G0.5)*  which  will  be  denoted 
{ R ).  If  for  some  reason  we  want  to  know  whether  R  is 
greater  than  1  in  more  than  80%  of  the  cases,  then  we 
would  be  dealing  with  (2o.8*  For  the  rest  of  this  paper  we 
will  consider  only  plots  of  (R)  for  reasons  of  simplicity. 
The  correlation  between  magnitude  M  and  ratio  R  can  be 
obtained  from  the  moving  quantile  technique  instead  of 
from  the  point  measures  technique  mentioned  above. 

The  density  plot  can  be  viewed  as  a  list  of  N  paired 
values.  Initially  we  order  the  pairs  with  respect  to  the 
reference  magnitude  to  form  a  list  L\  in  which  the  first 
pair  has  the  lowest  measured  value  for  M  in  the  reference 
model,  the  second  has  the  second  lowest  and  so  on.  Next 
we  build  a  secondary  plot  as  follows. 

Pick  a  window  size  W  smaller  (usually  much  smaller) 
than  the  sample  size  Ai\  collect  the  first  W  ratios  from 
the  list  Li,  calculate  the  go.5,  and  pair  this  number  (R) 
with  the  median  value  of  the  corresponding  M  values  of 
the  reference  model,  which  will  be  denoted  (A/).  Advance 
the  window  by  one  position,  collect  ratios  2  to  W  +  1, 
calculate  (R),  and  pair  it  with  the  corresponding  (M) 
value.  Continue  this  procedure  until  the  last  ratio  from 
the  list  L\  is  used  for  the  first  time.  We  now  have  a  new 
list,  L2,  of  size  N  -  W  +  1  that  is  ordered  from  the 
smallest  to  largest  values  for  (M )  of  the  reference  model. 
A  moving  median  exhibits  the  following  general  statistical 
properties.  For  an  infinite  ordered  population,  the  moving 
median  tends  to  the  mean  of  the  population  as  the  window 
size  W  increases  without  limit.  For  a  finite  ordered  sample 
of  size  N ,  the  moving  median  tends  to  the  median  of  the 
sample  as  W  approaches  N. 

The  plot  of  list  L2  exhibits  a  moving  median  (R)  on  the 
y  axis  that  corresponds  to  the  equivalent  moving  median 
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(Af)  on  the  x  axis.  These  moving  quantile  plots  allow  us 
to  determine  the  percentage  of  comparisons  in  which  (R) 
is  larger  than  1  and,  at  the  same  time,  whether  or  not  there 
is  any  correlation  between  {R)  and  (Af )  of  the  reference 
model.  A  slope  of  zero  or  infinity  in  the  moving  quantile 
plot  of  L2  shows  there  is  no  correlation  between  {R} 
and  (Af )  of  the  reference  model.  Applications  of  moving 
average  techniques  that  are  of  a  more  classical  nature  can 
be  found  in  Hamilton  (1994)  and  Huang  and  Dunsmuir 
(1998). 

Examples  and  discussion 

Moving  median  plots  of  L2  lists  can  be  used  to  compare 
the  relative  effectiveness  of  two  different  classes  of  models 
on  the  basis  of  some  criterion.  For  example,  assume  that 
Af  measures  the  sensitivity  of  a  model  to  fluctuations 
in  a  given  parameter  and  that  this  parameter  sensitivity 
should  be  as  low  as  possible  according  to  the  criterion 
of  model  robustness.  The  ratio  R  of  Af  values  in  the 
reference  model  to  Af  values  in  the  alternative  model, 
which  is  otherwise  internally  and  externally  equivalent  to 
the  reference  model,  is  plotted  and  from  this  density  plot 
one  forms  the  moving  median  plot  of  (R)  versus  (Af). 
Examples  of  such  plots  that  exhibit  various  patterns  are 
presented  and  their  interpretation  discussed  below. 

Figure  1 A  shows  a  plot  of  R  versus  Af  with  N  =  10000 
and  a  uniform  scatter  in  both  the  R  and  Af  values.  Since 
the  scatter  is  uniform,  we  would  expect  to  find  that  (/?)  is 
independent  of  (Af).  In  this  example  Af  values  are  in  the 
interval  [0,  10]  and  R  values  are  in  the  interval  [0,  1].  As 
the  window  size  grows,  the  resulting  moving  median  (R) 
approaches  0.5  with  progressively  smaller  bounds  because 
0.5  is  the  median  of  the  sample.  Figures  IB  and  C  show 
plots  of  {R}  versus  (Af)  for  window  sizes  of  50  and  500, 
which  can  be  thought  of  as  the  relevant  sample  size  in 
this  context.  Figures  IB  and  C  also  show  that  there  is  no 
correlation  between  (R)  and  (Af);  i.e.  the  values  for  the 
moving  median  (R)  are  independent  of  the  values  for  (Af ) 
of  the  reference  model. 

It  is  important  to  emphasize  that  different  density 
plots  can  have  similar  moving  quantile  plots,  due  to  the 
statistical  nature  of  quantiles.  For  example,  if  R  and  Af 
were  both  normally  distributed  and  uncorrelated,  then  the 
moving  median  plots  of  (/?)  versus  (Af )  would  be  similar 
to  those  in  Figures  IB  and  C  for  the  same  sample  and 
window  sizes. 

Figure  2A  shows  a  plot  in  which  R  is  sometimes  larger 
than  1.  However,  the  moving  quantile  plot  for  0O.5  in 
Figures  2B  and  C  show  that  in  most  cases  the  value  for 
Af  of  the  reference  model  is  smaller  than  that  of  the 
alternative  model  ((/?)  less  than  1).  Also,  there  is  a  clear 
correlation  between  the  value  for  (R)  and  the  value  for 
(Af)  of  the  reference  model,  which  is  unlike  the  case  in 


Figures  IB  and  C.  The  value  for  (R)  is  a  function  of  (Af ) 
with  a  minimum  around  (Af)  ^  1.  With  (Af)  ^  1,  the 
value  for  Af  of  the  reference  model  is  much  less  than  that 
of  the  alternative  model.  With  values  for  (Af)  that  are 
increasing  or  decreasing  away  from  1,  the  value  for  Af 
of  the  reference  model  approaches  that  of  the  alternative 
model. 

The  selection  of  an  appropriate  window  size  is  critical. 
If  W  is  too  small  (e.g.  5),  the  go.5  plot  will  not  differ 
significantly  from  the  raw  density  plot.  If  W  is  too  large, 
the  correlation  between  (R)  and  (Af )  will  be  lost,  or  at 
least  attenuated.  This  can  be  seen  by  comparing  the  curves 
for  the  two  different  window  sizes  in  Figures  2B  and  C.  As 
the  window  size  increases  from  50  to  500,  the  slope  of  the 
branch  for  (Af )  less  than  1  decreases  (if  the  window  size 
is  increased  further,  the  slope  eventually  becomes  0).  This 
happens  because  the  early  samples  of  R  are  contaminated 
with  latter  samples  and  the  correlation  with  the  lower 
values  of  Af  is  lost.  With  larger  window  sizes  the  slope 
of  the  branch  for  (Af )  greater  than  1  also  decreases.  As 
W  approaches  N ,  the  slope  of  the  curve  on  either  side  of 
(Af )  as  1  tends  toward  0  and  the  00.5  plot  provides  no 
more  information  than  calculating  the  median  of  the  entire 
sample.  Thus,  the  advantages  of  a  0o.5  plot  only  become 
apparent  at  intermediate  window  sizes.  There  is,  to  our 
knowledge,  no  good  way  of  deciding  the  optimal  size  for 
the  window  W;  this  depends  on  the  sample  size  N  and  on 
the  nature  of  the  sample  itself  and  must  be  determined  by 
trial  and  error. 

Figure  3  A  illustrates  a  case  in  which  the  values  of  Af  can 
only  assume  a  finite  number  of  discrete  values.  Figures  3B 
and  C  show  the  corresponding  plots  for  (R)  versus  (Af) 
of  the  reference  model.  A  correlation  between  (/?)  and 
(Af )  is  evident  at  low  values  of  (Af )  but  disappears  as 
(Af )  increases.  In  addition,  the  0o.5  plot  in  Figures  3B 
and  C  shows  the  dispersion  in  the  moving  median  at  each 
value  of  (Af),  unlike  the  Go. 5  plots  in  Figures  IB  and  C 
and  in  Figures  2B  and  C.  This  dispersion  occurs  because 
there  are  several  pairs  in  the  list  L\  that  have  the  same 
discrete  value  for  Af  but  different  discrete  values  for  /?. 
As  the  window  W  moves  through  a  series  of  identical  Af 
values,  the  median  value  for  Af  will  remain  unchanged 
whereas  the  median  value  for  R  will  change.  One  can 
construct  discrete  density  plots  for  any  of  the  previous 
examples  by  designating  classes  for  the  values  of  Af  and 
by  representing  each  class  by  the  median  of  the  class 
interval.  Thus,  the  plot  of  (R)  versus  (Af )  in  cases  such  as 
these  can  give  us  information  not  only  about  frequencies 
and  correlations  but  also  about  dispersion  of  the  results. 
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Abstract 

Motivation:  Mathematical  models  are  the  only  realistic 
method  for  representing  the  integrated  dynamic  behavior 
of  complex  biochemical  networks.  However,  it  is  difficult 
to  obtain  a  consistent  set  of  values  for  the  parameters  that 
characterize  such  a  model.  Even  when  a  set  of  parameter 
values  exists,  the  accuracy  of  the  individual  values  is 
questionable.  Therefore,  we  were  motivated  to  explore 
statistical  techniques  for  analyzing  the  properties  of  a 
given  model  when  knowledge  of  the  actual  parameter 
values  is  lacking. 

Results:  The  graphical  and  statistical  methods  presented 
in  the  previous  paper  are  applied  here  to  simple  un¬ 
branched  biosynthetic  pathways  subject  to  control  by 
feedback  inhibition.  We  represent  these  pathways  within 
a  canonical  nonlinear  formalism  that  provides  a  regular 
structure  that  is  convenient  for  randomly  sampling  the 
parameter  space.  After  constructing  a  large  ensemble 
of  randomly  generated  sets  of  parameter  values,  the 
structural  and  behavioral  properties  of  the  model  with 
these  parameter  sets  are  examined  statistically  and 
classified.  The  results  of  our  analysis  demonstrate  that 
’  certain  properties  of  these  systems  are  strongly  correlated, 

A  thereby  revealing  aspects  of  organization  that  are  highly 

probable  independent  of  selection.  Finally,  we  show  how 
specification  of  a  given  behavior  affects  the  distribution 
.  of  acceptable  parameter  values. 

Contact:  Savageau@umich.edu 

Introduction 

The  characterization  of  large  and  complex  biochemical 
networks  cannot  be  achieved  with  the  direct  intuitive  ap¬ 
proaches  that  have  been  successful  for  simpler  model  sys- 

*To  whom  correspondence  should  be  addressed. 


terns.  The  more  systematic  tools  provided  by  mathemati¬ 
cal  modeling  and  computer  analysis  have  become  essen¬ 
tial  because  they  are  especially  well  suited  for  organizing 
large  amounts  of  data  and  representing  nonlinear  and  par¬ 
allel  processes. 

The  most  common  method  of  constructing  an  appropri¬ 
ate  model  for  a  biochemical  system  has  been  the  reduc¬ 
tionist  or  bottom-up  approach.  The  component  parts  are 
isolated  and  characterized,  and  then  the  resulting  submod¬ 
els  are  assembled  into  a  model  of  the  integrated  system. 
For  example,  in  the  study  of  metabolic  pathways,  individ¬ 
ual  enzymes  were  isolated  and  kinetically  characterized  in 
vitro ;  pathway  models  were  then  constructed  by  assembly 
of  the  individual  rate  laws.  The  fundamental  problems  in¬ 
herent  in  this  approach  are  three  (Ni  and  Savageau,  1996): 

1 .  failure  to  identify  all  the  relevant  components 

2.  failure  to  identify  all  the  relevant  interactions 

3.  failure  to  determine  accurately  all  the  relevant 
parameter  values. 

The  associated  practical  problems  are  the  enormous 
numbers  of  components  and  interactions  that  need  to  be 
identified  and  the  difficulty  of  reproducing  the  conditions 
experienced  by  the  components  in  their  natural  setting  so 
that  their  parameter  values  can  be  accurately  determined 
in  vitro  (e.g.  Clegg,  1984;  Moore  et  al ,  1984;  Ovadi 
and  Srere,  1996;  Savageau,  1992;  Sorribas  et  al,  1993). 
These  problems  have  limited  the  success  of  the  bottom- 
up  approach  (e.g.  Albe  and  Wright,  1992;  Antunes  et  al. , 
1996;  Curto  et  al,  1998;  Ni  and  Savageau,  1996;  Shiraishi 
and  Savageau,  1993) 

An  alternative  method  of  constructing  an  appropriate 
model  is  often  termed  the  reverse-engineering  or  top- 
down  approach.  Many  of  the  variables  are  measured  in 
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the  intact  system,  and  then  one  attempts  to  reconstruct 
the  underlying  model  that  produced  these  data.  The 
fundamental  problems  in  this  case  are: 

1.  selecting  a  mathematical  representation  that  is  suffi¬ 
ciently  general  so  that  one  can  be  assured  that  it  will 
encompass  the  system  to  be  characterized 

2.  the  theoretical  limits  on  what  can  be  identified 
(problem  of  identifiability:  e.g.  Chappell  and 
Godfrey,  1992;  Feng  and  DiStefano,  1991;  Ginn 
and  Cushman,  1992)  when  one  can  only  measure 
a  subset  of  the  variables  (problem  of  observability: 
e.g.  Moheimani  etal.,  1996;  Xu  et  al.,  1996). 

The  practical  problems  are  associated  with  the  limits 
of  the  technologies  currently  available  for  measuring  all 
the  relevant  variables.  Although  the  top-down  approach 
has  long  been  applied  in  simple  cases  that  illustrate 
the  method  (e.g.  Brown  et  al.,  1990;  Diamond,  1975; 
Domnitz,  1976;  Kargi  and  Shuler,  1979;  Quant,  1993; 
Voit  and  Savageau,  1982),  its  use  in  biology  is  currently 
being  driven  by  the  new  techniques  coming  out  of  the 
Human  Genome  Project  that  generate  massive  data  sets 
(e.g.  Brown  and  Botstein,  1999;  Chu  et  al.,  1998;  DeRisi 
et  al.,  1997;  Eisen  et  al.,  1998;  Somogyi  et  al.,  1997; 
Toronen  et  al.,  1999).  Although  the  top-down  approach 
shows  considerable  promise,  it  is  unlikely  that  this  method 
alone  will  provide  a  satisfactory  solution  to  the  problem  of 
modeling  large  and  complex  biochemical  systems. 

If  one  is  interested  in  modeling  a  specific  system 
(e.g.  tryptophane  biosynthetic  pathway  of  Escherichia 
coli),  the  best  way  to  proceed  is  to  measure  all  the 
necessary  parameters  of  the  system  in  the  organism  of 
interest  and  build  the  model  based  on  those  values. 
One  could  productively  combine  the  bottom-up  and  top- 
down  approaches  described  above  (Bliss  et  al.,  1982; 
Yanofsky  and  Horn,  1994).  On  the  other  hand,  if  one  is 
interested  in  a  generic  class  of  systems  (e.g.  amino  acid 
biosynthetic  pathways  in  general)  or  if  the  measurements 
are  impossible  to  perform  with  accuracy  and  precision, 
even  a  combination  of  the  two  approaches  may  not  be 
adequate. 

In  this  paper  we  propose  a  statistical  approach 
for  dealing  with  generic  classes  of  biochemical  sys¬ 
tems.  We  apply  this  approach  to  a  general  three-step 
unbranched  biosynthetic  pathway  with  inhibitory 
feedback.  This  pathway  is  an  abstraction  from 
the  collection  of  unbranched  pathways  responsible 
for  the  biosynthesis  of  amino  acids  (e.g.  see  http: 
//www.genome.ad.jp/kegg/dblinks/map/mapOl  150.html). 
The  results  of  our  analysis  demonstrate  that  certain  prop¬ 
erties  of  these  systems  are  strongly  correlated,  thereby 
revealing  aspects  of  organization  that  are  highly  probable 
independent  of  selection. 


*4 


Fig.  1.  Three-step  unbranched  biosynthetic  pathway  with  inhibitory 
feedback.  The  metabolites  are  represented  by  X  with  an  appropriate 
subscript.  The  horizontal  arrows  represent  chemical  conversion, 
whereas  the  vertical  arrows  represent  modifier  influences  either 
positive  or  negative.  This  pathway  can  be  viewed  as  an  abstraction 
of  the  biosynthetic  pathways  for  amino  acids. 

Methods 

Amino  acid  biosynthetic  pathways  and  their  regulation 
have  been  studied  intensively  for  more  than  40  years. 
There  is  widespread  acceptance  among  cell  physiologists 
that  the  principal  role  of  these  systems  is  to  provide  a 
homeostatically  regulated  supply  of  amino  acid  for  protein 
synthesis.  This  role  has  been  characterized  in  terms  of 
several  behaviors  that  can  be  described  by  quantitative 
criteria  (Savageau,  1976)  that  will  be  elaborated  upon  in 
this  paper. 

Systemic  description  and  analysis 
An  unbranched  three-step  pathway  with  feedback  inhibi¬ 
tion  is  depicted  in  Figure  1.  The  independent  variable  X4 
represents  the  cell  demand  for  the  end  product  X3.  If  the 
cell  requires  large  amounts  of  X3,  then  the  value  of  X4 
will  be  high;  if  small  amounts  of  X3  are  required,  then  the 
value  of  X4  will  be  low.  The  dynamic  behavior  of  such 
a  model  can  be  described  by  a  set  of  ordinary  differential 
equations,  one  equation  per  intermediate.  This  set  of  equa¬ 
tions  can  be  approximated  to  the  first  order  in  logarithmic 
space,  yielding  another  set  of  ordinary  differential  equa¬ 
tions  with  the  canonical  form  of  an  S-system  (Savageau, 
1969): 

1£-  =  aiX£‘Xl“-a2flxf‘ 

^  j~\ 

;= 1  ;= 2 

^ = «3  n  x?j  -  a4*f43  x?.  (i) 

dt  j—2 

The  multiplicative  parameters,  a,  can  be  interpreted  as 
rate  constants  that  are  always  positive.  The  exponential 
parameters,  g,  can  be  interpreted  as  kinetic  orders  that 
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represent  the  direct  influence  of  each  species  on  each 
rate  law.  If  X,-  is  directly  involved  in  the  reactions  of  the 
aggregate  rate  law  Vj,  as  either  a  substrate  or  a  modulator, 
and  if  an  increase  in  Xj  causes  an  increase  in  the  rate 
Vj,  then  the  kinetic  order  will  be  positive.  If  an  increase 
in  Xi  causes  a  decrease  in  Vj,  then  the  kinetic  order 
will  be  negative.  If  X,-  is  not  directly  involved  in  Vj, 
then  the  kinetic  order  will  be  zero.  The  kinetic  orders 
£i+l,i(0  <  i  <  3)  in  (1)  are  positive  because  these  are 
the  kinetic  orders  for  substrates  of  reactions.  The  kinetic 
order  #44  will  be  set  arbitrarily  equal  to  1  for  the  remainder 
of  this  paper  in  order  to  simplify  subsequent  calculations. 
This  will  not  affect  our  results  in  any  significant  way 
since  #44  is  simply  a  scale  factor  in  logarithmic  space. 
The  remaining  kinetic  orders,  which  represent  negative 
feedback  interactions,  are  negative. 

At  a  steady  state,  the  rate  of  production  and  the  rate  of 
consumption  will  be  equal  for  each  intermediate,  and  (1) 
reduces  to  the  following  matrix  equation  (Savageau, 
1969): 


~b\  -  gi0Y0- 
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where  £>,  =  ln(a,+i/a,),  ay  =  gy  -  gt+iy  and  F,  = 
ln(Xj).  This  linear  equation  is  easily  solved,  e.g.  using 
Cramer’s  rule,  to  provide  a  steady-state  expression  in 
symbolic  form  for  each  T/. 

Other  steady-state  magnitudes  of  interest  can  be  calcu¬ 
lated  in  a  similar  way.  Logarithmic  gains  quantify  the  in¬ 
fluence  of  each  independent  variable  on  each  dependent 
variable;  e.g.  the  logarithmic  gain 


L(Xi,X 0)  = 


dln(X,) 

dln(Xo) 


dL 

dF0 


gives  the  percentage  change  in  an  intermediate  X,-  caused 
by  a  percentage  change  in  Xo.  These  logarithmic  gains  are 
calculated  analytically  at  the  steady  state  (Savageau,  1971) 
by  differentiating  each  Yj  with  respect  to  To- 
Parameter  sensitivities  quantify  the  influence  of  each 
parameter  on  each  dependent  variable  of  the  system;  e.g. 
the  sensitivity 


The  steady  state  for  an  unbranched  biosynthetic  path¬ 
way  should  be  locally  stable;  i.e.  the  system  should  return 
to  its  original  steady  state  after  a  small  perturbation  in  the 
variables  (as  opposed  to  the  parameters)  of  the  system.  If 
this  does  not  occur,  the  system  is  dysfunctional.  The  sta¬ 
bility  can  be  determined  by  using  the  well-known  Routh 
criteria  (Savageau,  1976). 

Any  of  these  systemic  properties  can  be  analytically 
determined  in  the  steady  state  by  using  the  S-systems  local 
representation.  However,  having  an  analytical  expression 
for  these  systemic  properties  is  just  the  first  step  in  the 
analysis  of  a  system.  Interpretation  of  these  analytical 
expressions  can  be  problematic  because  they  depend 
on  many  parameters  and  their  behavior  is  too  complex 
for  easy  visualization.  Even  when  a  general  qualitative 
interpretation  can  be  obtained  just  by  looking  at  the 
closed-form  expressions  [e.g.  L(X3,  Xo)  <  L(Xi,Xo)], 
the  results  are  difficult  to  quantify  [e.g.  how  much  larger 
isL(X,,X0)?]. 

Also,  there  are  no  general  closed-form  solutions  for 
the  dynamic  properties  of  the  system.  To  analyze  these 
properties  one  must  specify  numerical  values  for  the 
parameters  and  solve  the  differential  equations  (1)  using 
numerical  techniques.  An  example  of  such  a  property  is 
the  settling  time  of  a  system,  which  is  defined  as  the  time 
required  for  a  system  to  return  to  its  steady  state  after  a 
perturbation  in  the  levels  of  its  metabolites.  The  settling 
time  also  gives  us  an  indication  of  the  average  transit 
time  for  material  passing  through  the  system.  Short  transit 
times  allow  a  system  to  respond  rapidly  to  changes  in  its 
environment  (Savageau,  1972). 

Defining  classes  of  systems  for  statistical  comparison 

If  one  wishes  to  understand  the  general  properties  of 
pathways  such  as  the  one  depicted  in  Figure  1,  then  one 
faces  the  following  dilemma.  General  results  that  follow 
from  the  closed-form  analytical  expressions  may  be  too 
complex  to  interpret  and  quantify,  and  quantitative  results 
for  particular  values  of  the  parameters  do  not  yield  general 
insights.  One  way  of  resolving  this  dilemma  is  to  study  the 
statistical  properties  for  a  class  of  systems  generated  by  an 
ensemble  of  sets  of  parameter  values.  We  shall  consider 
two  different  methods  for  defining  the  class  of  interest. 


S(Xi,Pj) 


dln(X,)  =  dYi_ 

din (pj)  ~  Pj  dp j 


gives  the  percentage  change  in  the  concentration  X,- 
caused  by  a  percentage  change  in  the  parameter  pj . 
These  parameter  sensitivities  and  those  of  the  steady-state 
flux  are  also  calculated  analytically  at  the  steady  state. 
The  parameter  sensitivities  give  important  information 
about  the  sensitivity  of  the  system  to  perturbations  in  its 
structure. 


Structural  classes.  Systems  that  have  the  same  network 
topology  (i.e.  have  the  same  pattern  of  interactions  among 
their  elements  and  the  same  signs  for  the  interactions)  will 
be  defined  as  members  of  the  same  structural  class.  As 
a  case  study  for  this  paper  we  have  chosen  the  system 
in  Figure  1  and  described  its  local  behavior  by  the  S- 
system  representation  within  the  power-law  formalism.  By 
so  doing  we  have  defined  a  specific  class  of  systems  that 
share  the  same  network  topology.  By  focusing  on  such  a 
topology  we  have  limited  the  study  to  systems  belonging 
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to  the  same  structural  class.  Individual  members  of  this 
structural  class  can  be  generated  by  sampling  the  space  of 
parameters  that  define  the  class  and  their  characteristics 
can  be  obtained  from  the  corresponding  solutions  of  (1). 

Behavioral  classes.  Systems  that  exhibit  a  specific  type 
of  systemic  behavior  will  be  defined  as  members  of  the 
same  behavioral  class.  For  example,  those  systems  belong¬ 
ing  to  the  structural  class  in  Figure  1  that  have  a  single  lo¬ 
cally  stable  steady  state  can  be  defined  as  members  of  a 
behavioral  class.  Individual  members  of  such  a  behavioral 
class  cannot  be  generated  directly  by  sampling  at  random 
the  space  of  parameters  because  some  of  the  parameter 
sets  will  produce  unstable  systems.  Instead,  they  must  be 
generated  indirectly,  e.g.  by  sampling  at  random  the  space 
of  parameters,  testing  the  sample  for  the  desired  behavior, 
and  then  retaining  only  the  relevant  samples. 

In  the  example  above  the  behavioral  class  is  a  subclass 
within  the  structural  class,  but  this  need  not  be  so.  If 
our  only  knowledge  of  the  system  was  that  it  had  three 
metabolites,  we  could  study  an  ensemble  of  models  in 
which  each  kinetic  order  might  have  positive  or  negative 
values,  which  generates  models  belonging  to  different 
structural  classes.  One  could  then  choose  models  for  study 
based  simply  on  their  behavior,  disregarding  the  signs  of 
the  kinetic  orders. 

Several  (elementary)  behavioral  classes  can  be  com¬ 
bined  to  define  a  composite  behavioral  class  whose 
members  are  systems  that  exhibit  all  of  the  individual 
systemic  behaviors. 

Sampling  the  parameter  space 

The  regular  structure  of  the  local  S-system  representation 
facilitates  building  the  ensemble  of  sets  of  parameter 
values.  The  positive  kinetic  orders  gi+ij  refer  to  enzymes 
binding  their  substrates.  The  maximum  value  for  these 
kinetic  orders  is  given  by  the  number  of  substrate  binding 
sites  on  the  enzyme/  In  the  majority  of  cases  there  are 
less  than  four  such  sites  (Hlavacek  and  Savageau,  1995; 
Voit  and  Savageau,  1987).  Thus  we  will  assume  that  these 
kinetic  orders  have  values  between  0  and  5.  The  negative 
kinetic  orders  (gi3>  g22,  g23  and  #33)  refer  to  enzymes 
binding  inhibitors.  In  most  cases  there  are  again  fewer  than 
four  such  binding  sites  per  enzyme,  and  we  will  assume 
that  these  kinetic  orders  have  values  between  —5  and  0. 
One  can  always  normalize  the  time  scale  with  respect  to 
one  of  the  rate  constants.  The  others  will  be  assumed  to 
have  normalized  values  within  5  orders  of  magnitude  of  1 . 
Thus,  the  logarithm  of  each  normalized  rate  constant  will 

f  This  is  not  always  true  of  reversible  reactions  operating  close  to  equilib¬ 
rium.  The  usual  strategy  for  aggregating  fluxes  can  lead  to  kinetic  orders 
with  extremely  large  absolute  values.  This  problem  can  be  solved  by  using 
an  alternative  strategy  for  aggregating  fluxes  (Sorribas  and  Savageau,  1989). 
However,  we  will  not  deal  with  these  cases  here. 


have  values  between  —5  and  5. 

In  building  an  appropriate  ensemble  of  sets  of  parame¬ 
ter  values  one  needs  to  use  a  representative  sample  of  the 
allowable  parameter  space.  Since  the  statistical  distribu¬ 
tion  of  parameter  values  in  real-life  systems  is  unknown, 
the  most  appropriate  approach  is  to  sample  the  space  uni¬ 
formly.  There  are  several  strategies  for  accomplishing  this. 

First,  one  can  impose  a  regular  grid  on  the  multidimen¬ 
sional  parameter  space  and  use  the  vertices  of  that  grid 
to  define  the  set  of  parameter  values.  In  general,  a  sys¬ 
tem  with  n  unknown  parameters  and  the  same  grid  size, 
a),  will  require  of  samples.  This  exponential  increase  in 
number  of  required  samples  makes  it  difficult  to  maintain 
a  dense  grid  as  the  number  of  parameters  increases.  Also, 
maintaining  a  rigid  grid  complicates  matters  when  one  is 
studying  ensembles  of  parameter  sets  that  give  rise  to  cer¬ 
tain  types  of  systemic  behavior.  Second,  pseudo-random 
number  generators  can  be  used  to  generate  the  largest  pos¬ 
sible  sample  size  without  having  a  rigid  grid  to  sample 
from.  This  method  facilitates  the  study  of  ensembles  of 
parameter  sets  that  give  rise  to  certain  types  of  systemic 
behavior.  Third,  strategies  based  on  number  theory  can  be 
used  to  generate  what  are  known  as  quasi-random  num¬ 
bers  that  are  uniformly  distributed.  Examples  include  Hal- 
ton  and  Solov  sequences  [for  a  review  see  Bratley  and 
Fox  (1988)].  Finally,  another  technique  devised  for  deal¬ 
ing  with  large  parameter  spaces  is  the  Latin  Hyper  cube. 
The  Latin  Hyper-cube  ensures  that  each  parameter  will  be 
sampled  in  every  one  of  its  sub-ranges.  It  has  no  advan¬ 
tage  over  the  other  methods  mentioned  above  if  there  are 
important  interactions  between  parameters  [for  a  discus¬ 
sion  see  Dunn  and  Clark  (1974)].  For  the  results  reported 
below  we  have  used  the  pseudo-random  number  generator. 

Specifying  behavioral  classes 

Since  the  system  in  Figure  1  is  an  abstraction  of  an  un¬ 
branched  biosynthetic  pathway,  the  literature  was  searched 
and  a  basic  number  of  desirable  characteristics  have  been 
found  for  such  systems.  The  group  of  all  these  character¬ 
istics  was  used  to  define  a  composite  behavioral  class.  If 
the  model  generated  by  a  given  set  of  parameter  values  did 
not  belong  to  this  class,  then  the  set  was  discarded  and  a 
new  random  set  was  tested.  In  this  way  we  generated  en¬ 
sembles  of  5000  for  our  studies. 

The  composite  behavioral  class  studied  is  defined  by  a 
collection  of  six  elementary  behavioral  classes  with  the 
following  characteristics: 

B 1 .  The  steady-state  concentration  of  pathway  interme¬ 
diates  should  be  low  when  compared  with  the  con¬ 
centration  of  the  final  product.  The  major  function 
of  unbranched  biosynthetic  pathways  is  production 
of  their  end  product  (e.g.  X$  in  the  example  of  Fig¬ 
ure  1).  High  concentrations  of  intermediates  per  se 
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are  unnecessary;  they  would  tax  the  solvent  capac¬ 
ity  of  the  cell  and  potentially  interfere  in  a  nonspe¬ 
cific  way  with  otherwise  unrelated  reactions  (e.g. 
Atkinson,  1969;  Savageau,  1972;  Srere,  1987  and 
Levine  and  Ginsburg,  1985,  for  a  general  discus¬ 
sion  of  the  subject  from  different  perspectives).  For 
the  results  presented  in  the  next  section,  a  param¬ 
eter  set  was  accepted  only  if  the  steady-state  ratio 
(Xi  +  Xf)/ Xt,  <  0.1.  This  value  for  the  ratio  was 
chosen  arbitrarily  because  there  are  no  reliable  mea¬ 
surements  on  which  to  base  a  more  accurate  esti¬ 
mate. 

B2.  Changes  in  the  concentration  of  intermediates 
caused  by  changes  in  demand  for  the  end  product 
should  be  small.  The  previous  condition  ensures 
that  the  concentration  of  intermediates  will  not 
saturate  the  solvent  capacity  of  a  cell  in  a  given 
steady  state.  However,  if  the  metabolic  conditions 
change  and  the  demand  for  the  end  product  of  the 
pathway  changes,  this  will  cause  the  concentration 
of  each  intermediate  to  change,  which  may  lead 
to  saturation  of  the  solvent  capacity  in  the  new 
steady  state  (e.g.  Savageau,  1972).  This  could  be 
prevented  in  our  model  if  the  absolute  values  of  the 
logarithmic  gains  for  intermediates,  \L(Xi,Xf)\, 
are  smaller  than  a  predetermined  value  arbitrarily 
set  at  0.5. 

B3.  Changes  in  the  concentration  of  intermediates 
caused  by  changes  in  the  initial  substrate  should 
be  small.  This  will  buffer  the  intermediate  con¬ 
centrations  against  changes  in  metabolism  that 
are  reflected  in  alterations  in  the  level  of  initial 
substrate.  This  could  be  ensured  if  the  absolute 
values  of  the  logarithmic  gains  for  intermediates, 

I L(Xi,  A'o) |,  are  smaller  than  a  predetermined 
value,  e.g.  |L(Z/,X0)|  <  0.5  (i  =  1,2,3).  This 
value  is  chosen  arbitrarily  because  there  are  no 
reliable  measurements  on  which  to  base  a  more 
accurate  estimate. 

B4.  Systems  should  be  robust,  i.e.  insensitive  to  spurious 
fluctuations  in  the  parameters  that  define  their 
structure  (Savageau,  1972).  We  require  that  each 
intermediate  have  an  aggregate  sensitivity,  defined 

as  SQRT^Zy  S(Xi,  pf)2 J,  less  than  a  predetermine 
value  arbitrarily  set  equal  to  5. 

B5.  Each  system  should  have  a  locally  stable  steady 
state.  Systems  without  such  stable  steady  states  are 
dysfunctional  because  they  are  unable  to  maintain 
their  homeostatic  behavior  in  the  face  of  spurious 
perturbations.  The  two  margins  of  stability  can  be 
specified  in  terms  of  the  last  two  Routh  criteria  (e.g. 
Savageau,  1976). 


B6.  Systems  should  have  a  rapid  response  time.  This  is 
related  to  the  inverse  of  the  turnover  number  (Dixon, 
1958;  Savageau,  1975),  which  should  therefore 
be  high.  We  require  the  turnover  number  for  the 
pathway,  defined  as  the  pathway  flux  divided  by 
the  sum  of  the  intermediate  pools  (V/J2i  %i)>  t0 
be  larger  than  a  predetermined  value  arbitrarily  set 
equal  to  1. 


Bias  in  the  frequency  distribution  of  parameter  values 

The  values  for  each  parameter  were  originally  sampled 
with  a  uniform  distribution.  However,  those  parameter  sets 
that  define  systems  excluded  from  the  composite  behav¬ 
ioral  class  are  rejected,  and  the  frequency  distribution  of 
the  accepted  parameter  values  is  therefore  biased.  The  na¬ 
ture  of  the  bias  for  each  of  the  parameters  can  be  deter¬ 
mined  from  the  histograms  presented  in  Figure  2.  We  ob¬ 
serve  that  the  composite  behavioral  class  has  ai  biased 
towards  small  values  whereas  a.i,  a 3,  and  a4  are  biased  to¬ 
wards  large  values.  The  kinetic  order  for  the  substrate  of 
the  pathway,  gio,  is  biased  towards  small  values.  Its  fre¬ 
quency  increases  from  gio  =  0  to  gio  =  0.3  and  then 
decreases  exponentially  until  gio  =  5.  A  similar  pattern 
is  observed  for  £32,  although  the  frequency  increases  from 
0  to  1.8,  and  then  decrease  but  not  exponentially.  The  ki¬ 
netic  order  g2i  is  biased  towards  large  values,  and  £43  is 
nearly  uniform  over  its  range.  The  inhibitory  kinetic  order 
for  overall  feedback,  #13,  has  a  distribution  with  a  cen¬ 
tral  tendency,  whereas  the  other  inhibitory  kinetic  orders 
are  almost  uniformly  distributed  throughout  their  range  of 
possible  values. 

We  also  determined  the  parameter  distributions  for 
each  of  the  elementary  behavioral  classes  (B1-B6  defined 
above)  to  see  which,  if  any,  might  qualitatively  reproduce 
the  deviations  from  a  uniform  distribution  that  were 
observed  for  the  composite  behavioral  class  (Figure  2). 
Table  1  shows  which  elementary  class  is  mainly  respon¬ 
sible  for  the  shape  of  each  distribution  in  the  composite 
behavioral  class.  In  some  cases,  the  distribution  for 
the  composite  behavioral  class  can  be  attributed  to  the 
dominant  influence  of  a  particular  elementary  class  (e.g. 
B3  in  the  case  of  gio).  In  other  cases,  the  distributions 
for  the  composite  behavioral  class  can  be  attributed  to 
the  influence  of  several  elementary  classes  acting  in 
combination,  which  implies  a  synergistic  influence  (e.g. 
B1-B6  in  the  case  0:3). 


Frequency  distribution  for  systemic  properties  of  the 
ensemble 

The  frequency  distributions  for  all  steady-state  properties 
of  our  model  have  long  tails.  These  tails  make  it  difficult 
to  present  informative  histograms  for  each  of  the  systemic 
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Fig.  2.  Distribution  of  parameter  values  in  ensembles  of  systems  selected  on  the  basis  of  various  behavioral  classes.  Selection  involved 
each  of  the  six  elementary  behavioral  classes  (B1-B6)  considered  separately  and  the  composite  class  consisting  of  all  six  elementary  classes 
considered  together.  The  solid  line  in  each  panel  is  the  distribution  for  the  composite  behavioral  class.  Three  different  patterns  are  represented. 
In  most  cases  the  distribution  for  the  composite  class  is  closely  represented  by  the  distribution  for  one  of  the  elementary  classes  (a\ ,  <*4> 

glO,  £22*  £33)-  The  distributions  for  the  other  elementary  classes  have  very  different  shapes  and  are  not  shown.  In  four  cases  the  distribution 
for  the  composite  class  is  closely  resembled  by  two  or  more  of  the  distributions  for  the  elementary  classes  (g2l  >  £43>  £13’  £23)-  Distributions 
for  only  two  of  the  elementary  classes  are  shown.  In  two  cases  none  of  the  distributions  for  the  elementary  classes  is  a  close  match  to 
the  distribution  for  the  composite  class  (£*3,  #32)-  In  these  cases  we  show  only  the  distribution  for  the  elementary  class  that  most  closely 
resembles  the  distribution  for  the  composite  class. 


properties.  We  chose  to  cut  off  the  tails  and  add  their 
frequency  to  the  more  extreme  classes  presented  in  the 
histograms.  The  results  in  this  section  are  shown  as 
histograms  in  Figure  3.  We  did  not  include  histograms  for 
the  elementary  behavioral  classes  because,  in  most  cases, 
they  have  extremely  long  tails. 

Steady-state  concentrations  and  flux.  All  steady-state 
concentrations  have  frequency  distributions  that  decrease 


as  the  concentration  increases.  At  low  concentrations  the 
frequency  decreases  very  sharply  as  the  concentrations 
increase,  but  then  the  decrease  becomes  very  small  and 
there  is  a  long  tail  in  the  distribution.  The  modal  class 
for  all  of  the  frequency  distributions  is  small.  For  X\ 
and  X2  the  modal  class  is  in  the  interval  [0,0.2334]  with 
90%  and  75%  of  all  systems  in  this  interval,  respectively. 
The  modal  class  for  X3  has  a  larger  value,  in  the  interval 
[0.234,0.468].  Also,  only  10%  of  all  systems  fall  within 
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Fig.  5.  Examples  of  graphs  showing  the  statistical  synergism 
between  two  different  parameters.  Each  statistical  synergism  is 
determined  by  a  plot  of  the  moving  median  for  the  sensitivity  with 
respect  to  parameter  /?,•  versus  the  moving  median  for  parameter  pj 
constructed  from  an  ensemble  of  systems  selected  on  the  basis  of  the 
composite  behavioral  class.  Note  the  asymmetry  in  the  synergisms. 
See  text  for  discussion. 


Discussion 

The  study  of  generic  biochemical  systems  requires  a 
mathematical  formalism  that  is  systematically  structured 
and  capable  of  representing  rather  arbitrary  nonlinear 
phenomena.  The  power-law  formalism  provides  such  a 
canonical  nonlinear  representation  (Savageau,  1996),  and 
for  the  work  presented  in  this  paper  we  have  focused  on 
the  local  S-system  representation  within  this  formalism. 
This  representation,  although  nonlinear,  has  closed-form 
solutions  for  the  steady  state,  and  these  can  be  used  to 
study  systemic  properties  analytically.  However,  more 
often  than  not,  the  complexity  of  the  solutions  for  the 
properties  of  interest  makes  it  difficult  to  analyze  systemic 
behavior  without  assigning  specific  values  to  the  parame¬ 
ters.  In  most  cases  these  values  are  unknown;  when  they 
are  known,  they  limit  the  interpretation  of  the  results  to 
a  specific  system  and  thus  prevent  generalization  of  the 
results.  To  overcome  these  limitations  statistical  studies 
involving  large  ensembles  of  random  systems  have  been 
performed  in  a  variety  of  contexts  (see,  e.g.  Bhattacharjya 
and  Liang,  1996;  Glass,  1975;  Kauffman,  1969a, b,  1993, 
and  references  therein).  However,  to  our  knowledge  this 
approach  has  not  been  applied  to  continuous  models  for 
specific  classes  of  biochemical  systems  with  the  objective 


of  providing  an  exhaustive  statistical  characterization  of 
their  systemic  properties. 

In  this  work  we  have  created  large  ensembles  of  ran¬ 
domly  generated  parameter  values  for  a  given  structural 
class  of  biochemical  systems  and  imposed  selection  on  the 
basis  of  particular  systemic  properties.  We  then  examined 
the  resulting  systems  for  bias  in  parameter  values,  bias  in 
unselected  systemic  properties,  and  correlations  among  all 
their  systemic  properties. 

Selection  can  be  expected  to  influence  the  range  of 
parameter  values  in  the  resulting  systems.  Although 
specific  systemic  properties  have  been  used  for  some  time 
as  criteria  to  evolve  networks  towards  optimality  (e.g. 
James  et  al.,  1999),  few,  if  any,  attempts  have  been  made 
to  characterize  the  bias  in  parameter  values  that  results 
from  such  a  selection  procedure.  In  fact,  the  usual  view 
on  the  subject  is  that  parameter  values  determine  systemic 
behavior.  We  have  had  to  take  the  opposite  view  to  learn 
how  selection  for  particular  systemic  behaviors  influences 
the  frequency  distribution  of  parameter  values.  As  seen 
in  Figure  2,  there  are  regular  patterns  of  deviation  from 
what  was  a  uniform  distribution  before  imposing  selection 
based  on  the  composite  behavioral  class.  By  using  each 
of  the  elementary  behavioral  classes  as  an  independent 
selection  criterion  we  were  able  to  determine  whether  any 
given  elementary  class  made  a  major  contribution  to  the 
observed  bias  in  the  distribution  of  any  given  parameter 
value.  In  some  cases  this  is  true  (B3  in  the  case  of  gI0), 
whereas  in  others  the  distribution  of  parameter  values 
for  the  composite  class  is  the  result  of  interplay  among 
different  elementary  classes  (B1-B6  in  the  case  03). 

Information  on  the  distribution  of  parameter  values  is  of 
interest  in  the  design  of  experiments  to  measure  the  pa¬ 
rameters  in  actual  systems.  By  knowing  the  most  probable 
values  of  a  parameter,  one  can  design  experiments  to  target 
that  range.  Also,  the  use  of  behavioral  classes  to  study  spe¬ 
cific  kinds  of  systems  provides  an  effective  way  to  identify 
the  relative  importance  of  various  regions  in  the  parameter 
space  of  fit  systems. 

Selection  for  a  particular  systemic  property  may  also 
influence  other  unselected  systemic  properties.  As  seen 
in  Figure  3,  selection  on  the  basis  of  the  composite 
behavioral  class  produces  a  frequency  distribution  for  the 
values  of  the  different  systemic  properties  that  is  skewed 
in  nearly  every  case,  with  a  peak  at  low  values  and  a 
long  tail  that  decreases  in  frequency  almost  exponen¬ 
tially.  (This  is  true  of  the  distributions  for  the  aggregate 
sensitivities,  although  it  is  not  evident  from  the  curves 
for  Zj  and  X2  because  their  tails  are  off  the  scale.)  The 
exceptions  to  this  general  pattern  are  the  distributions 
for  the  logarithmic  gain  L(Xi,X0),  which  is  nearly 
uniform  over  the  range  [0,5],  and  the  logarithmic  gains 
L(X\,  Z4),  L(X2,  X4),  and  L(V,X 4),  which  exhibit  a 
symmetric  central  tendency.  We  have  also  determined  the 
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influence  of  the  elementary  behavioral  classes  on  these 
distributions,  but  the  results  are  less  straightforward  to 
interpret.  In  many  cases  the  distribution  for  the  composite 
class  is  quite  different  from  the  distribution  for  any  of 
the  elementary  classes  (data  not  shown).  This  indicates 
a  strong  synergism  between  different  constraints  that 
determine  the  distribution  of  values  for  the  systemic 
properties. 

Selection  also  can  be  expected  to  influence  the  correla¬ 
tions  among  the  various  systemic  properties  in  the  result¬ 
ing  systems.  We  have  used  a  moving  median  technique 
(Alves  and  Savageau,  2000)  to  determine  average  corre¬ 
lations  between  different  systemic  properties.  We  found 
that  these  correlations  exist  and  are,  at  least  in  some  cases, 
dependent  on  the  behavior  of  the  system  (Table  2).  For 
example,  most  aggregate  sensitivities  are  correlated  with 
the  concentrations  by  a  symmetric  curve  of  type  C8  if  no 
restrictions  are  imposed  on  the  values  of  the  concentra¬ 
tions  in  the  system  (data  not  shown).  However,  when  we 
imposed  the  condition  that  intermediate  concentrations  be 
small  (Bl),  this  kind  of  symmetry  breaks  down  (curves  of 
type  C8  and  C9  become  C3  and  C4),  because  the  systems 
being  studied  include  only  those  with  concentrations  that 
have  low  values.  It  is  important  to  note  that,  as  a  concen¬ 
tration  tends  to  unity,  the  sensitivities  to  the  kinetic  orders 
associated  with  that  concentration  will  tend  to  zero  (due 
to  the  properties  of  the  power  law  in  the  S-system  formal¬ 
ism).  This  will  tend  to  diminish  the  aggregate  sensitivities 
that  include  these  kinetic-order  sensitivities.  Table  2  also 
shows  that  in  the  composite  class,  robustness  of  intermedi¬ 
ates  and  stability  margins  are  inversely  correlated;  systems 
that  have  large  stability  margins  have,  on  average,  interme¬ 
diates  with  high  aggregate  sensitivities  and  are  thus  less 
robust. 

Finally,  the  same  technique  used  to  determine  the 
correlations  between  different  systemic  properties  also 
was  used  to  determine  the  statistical  synergisms  between 
different  parameters.  The  system  in  Figure  1  has  small 
synergisms  for  the  end  product  and  flux  (Table  3),  because 
in  many  cases  (54  out  of  120)  the  sensitivities  are  not 
correlated  with  any  parameter  (statistical  synergism  is 
zero).  Thus,  the  end  product  and  the  steady-state  flux  of 
the  system  are,  on  average,  well  buffered  against  second- 
order  perturbations. 

The  approach  illustrated  in  this  paper  provides  statistical 
insights.  It  might  be  argued  that  biological  systems 
are  optimized  and  atypical,  and  thus  not  compatible 
with  the  application  of  statistical  techniques.  However, 
this  objection  is  avoided  in  our  approach.  By  defining 
behavioral  classes  for  optimized  systems,  we  are  able  to 
study  the  average  behavior  of  optimized  systems  and  not 
just  the  average  behavior  of  random  systems. 

The  methods  we  have  described  can  in  principle  be 
applied  to  systems  with  more  complex  behaviors.  For 


example,  suppose  we  wish  to  consider  biochemical  sys¬ 
tems  that  are  capable  of  exhibiting  either  a  single  locally 
stable  steady  state  (nongrowing  cells  that  are  viable 
but  quiescent)  or  a  single  stable  limit  cycle  (growing 
cells  with  a  well-defined  cycle  time),  depending  only 
upon  the  value  of  an  environmental  cue.  The  behavioral 
classes  that  we  would  define  for  such  systems  would 
now  include  the  combined  properties  of  these  two  dif¬ 
ferent  modalities  as  well  as  the  properties  that  might 
be  applied  to  each  of  the  separate  modalities.  The  more 
complex  behavioral  class  would  include  a  number  of 
dynamic  properties  (e.g.  the  period,  amplitude,  phase, 
and  robustness  of  the  oscillation,  and  the  bifurcation 
value  of  the  environmental  cue  for  switching  between 
modalities),  and  the  analysis  necessary  to  identify  and 
characterize  these  behavioral  classes  would  accordingly 
become  more  complex.  The  local  S-system  representation 
is  capable  of  describing  each  of  the  separate  modalities 
(Lewis,  1991),  but  not  the  two  of  them  together  with  a 
given  set  of  parameter  values.  For  this  purpose  we  would 
need  the  generalized-mass-action  representation  within 
the  power-law  formalism.  This  representation  does  not 
have  analytical  solutions  for  the  steady  state,  and  so  the 
analysis  and  comparison  of  these  properties  would  have 
to  be  done  by  numerical  methods.  Randomly  generated 
sets  of  parameter  values  (which  would  now  include 
values  for  a  parameter  representing  the  environmental 
cue)  could  be  generated  as  before.  However,  we  would 
now  select  only  those  sets  of  parameter  values  that 
satisfy  the  more  complex  behavioral  class  that  includes 
both  modalities  and  the  appropriate  switching  between 
them  in  response  to  the  environmental  cue.  Those  sets 
of  parameter  values  that  only  yield  one  of  the  two 
modalities  would  be  excluded  from  consideration.  This 
would  ensure  that  any  averaging  procedure  that  is  subse¬ 
quently  applied  to  systems  with  the  randomly  generated 
parameter  sets  would  range  over  a  homogeneous  class  of 
systems. 

The  approach  proposed  in  this  work  also  may  be 
useful  in  providing  information  about  systems  that  are 
poorly  characterized.  For  example,  suppose  we  know 
the  structure  of  a  system,  but  we  are  able  to  determine 
experimentally  only  some  of  the  characteristic  behaviors 
of  the  system.  To  be  more  specific,  suppose  we  know 
that  the  concentrations  of  the  system  are  within  a  given 
range,  that  increasing  the  value  of  a  given  independent 
variable  will  always  cause  a  decrease  in  the  values 
for  some  dependent  variables,  and  that  we  are  able  to 
measure  the  range  of  values  for  the  turnover  times  of  the 
concentrations.  With  this  information,  we  could  generate 
ensembles  of  systems  with  the  described  characteristics 
and  study  them  statistically.  The  results  would  allow  us 
to  make  predictions  about  other  systemic  properties  that 
might  be  measured  and,  for  the  unknown  parameters, 
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about  the  range  of  values  most  likely  to  generate  systems 
with  the  known  behavior. 

A  combination  of  approaches  will  surely  be  needed  to 
advance  our  understanding  of  large  and  complex  systems 
in  biology.  We  need  to  take  advantage  of  the  broad-scale 
capabilities  of  the  top-down  genomic  technologies  and 
the  structural  constraints  provided  by  the  more  traditional 
bottom-up  methodologies  of  molecular  biology.  We  also 
need  to  identify  the  systemic  regularities  that  exist  even  in 
randomly  constructed  networks.  The  approach  presented 
in  this  paper  appears  well  suited  for  the  determination  of 
such  regularities  in  continuous  models.  It  may  facilitate 
the  design  of  experiments  to  measure  parameters  by 
the  bottom-up  approach  as  well  as  provide  a  suitable 
framework  to  determine  classes  of  models  that  give  a  good 
fit  to  data  obtained  by  the  top-down  approach. 
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Abstract 

Motivation:  The  method  of  mathematically  controlled 
comparison  has  been  used  for  some  time  to  determine 
which  of  two  alternative  regulatory  designs  is  better 
according  to  specific  quantitative  criteria  for  functional 
effectiveness.  In  some  cases,  the  results  obtained  using 
this  technique  are  general  and  independent  of  parameter 
values  and  the  answers  are  clear-cut.  In  others,  the  result 
might  be  general,  but  the  demonstration  is  difficult  and 
numerical  results  with  specific  parameter  values  can  help 
to  clarify  the  situation.  In  either  case,  numerical  results 
with  specific  parameter  values  can  also  provide  an  answer 
to  the  question  of  how  much  larger  the  values  might  be. 
In  contrast,  a  more  ambiguous  result  is  obtained  when 
either  of  the  alternatives  can  have  the  larger  value  for  a 
given  systemic  property,  depending  on  the  specific  values 
of  the  parameters.  In  any  case,  introduction  of  specific 
values  for  the  parameters  reduces  the  generality  of  the 
results.  Therefore,  we  have  been  motivated  to  develop  and 
apply  statistical  methods  that  would  permit  the  use  of 
numerical  values  for  the  parameters  and  yet  retain  some 
of  the  generality  that  makes  mathematically  controlled 
comparison  so  attractive. 

Results:  We  illustrate  this  new  numerical  method  in  a  step- 
by-step  application  using  a  very  simple  didactic  example. 
We  also  validate  the  results  by  comparison  with  the  cor¬ 
responding  results  obtained  using  the  previously  devel¬ 
oped  analytical  method.  The  analytical  approach  is  briefly 
present  for  reference  purposes,  since  some  of  the  same  key 
concepts  are  needed  to  understand  the  numerical  method 
and  the  results  are  needed  for  comparison.  The  numer¬ 
ical  method  confirms  the  qualitative  differences  between 
the  systemic  behavior  of  alternative  designs  obtained  from 

*To  whom  correspondence  should  be  addressed. 


the  analytical  method.  In  addition ,  the  numerical  method 
allows  for  quantification  of  the  differences  and  it  provides 
results  that  are  general  in  a  statistical  sense.  For  exam¬ 
ple ,  the  older  analytical  method  showed  that  overall  feed¬ 
back  inhibition  in  an  unbranched  pathway  makes  the  sys¬ 
tem  more  robust  whereas  it  decreases  the  stability  mar¬ 
gin  of  the  steady  state.  The  numerical  method  shows  that 
the  magnitudes  of  these  differences  are  not  comparable. 
The  differences  in  stability  margins  ( 1-2 %  on  average)  are 
small  when  compared  to  the  differences  in  robustness  (50- 
100%  on  average ).  Furthermore ,  the  numerical  method 
shows  that  the  system  with  overall  feedback  responds  more 
quickly  to  change  than  the  otherwise  equivalent  system 
without  overall  feedback.  These  results  suggest  reasons 
why  overall  feedback  inhibition  is  such  a  prevalent  reg¬ 
ulatory  pattern  in  unbranched  biosynthetic  pathways. 
Contact:  savageau  @  umich.  edu 


Introduction 

The  experimental  investigation  of  biological  regulatory 
mechanisms  has  revealed  an  enormous  variety  of  al¬ 
ternative  molecular  designs  and  raised  questions  about 
their  function,  design  and  evolution.  Mathematically 
controlled  comparison  is  a  technique  that  was  specifically 
developed  to  study  such  alternative  regulatory  designs 
(Savageau,  1972).  By  using  the  mathematical  analog  of 
a  well-controlled  experiment,  this  technique  analytically 
determines  the  irreducible  qualitative  differences  in 
the  systemic  behavior  of  the  alternative  designs.  This 
technique  has  been  used  to  study  alternative  regulatory 
designs  in  metabolic  pathways  (e.g.  Savageau,  1974; 
Hunding,  1974;  Savageau  and  Jacknow,  1979),  in  gene 
circuits  (e.g.  Hlavacek  and  Savageau,  1996),  in  immune 
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networks  (e.g.  Irvine  and  Savageau,  1985;  De  Boer  and 
Hogeweg,  1989a, b),  and  in  the  host-pathogen  response 
to  HTV  infection  (De  Boer  and  Perelson,  1998).  The 
introduction  of  numerical  values  for  the  parameters 
provides  quantification  of  these  differences  in  specific 
cases  but  eliminates  the  generality  of  the  results.  In  this 
paper  we  introduce  a  numerical  approach  to  mathemati¬ 
cally  controlled  comparisons  that  allows  the  introduction 
of  specific  numerical  values  for  the  parameters  in  the 
analysis  while  still  retaining  the  generality  of  the  results 
in  a  statistical  sense. 

The  most  common  use  of  mathematically  controlled 
comparison  requires  the  existence  of  closed-form  solu¬ 
tions  for  the  steady  state.  Such  solutions  can  be  obtained 
by  using  the  local  S-system  representation  to  characterize 
the  pathway  of  interest.  Important  functional  constraints 
are  introduced  by  equating  relevant  steady-state  proper¬ 
ties  of  the  alternative  systems  being  compared.  Further 
analysis  (dynamic  as  well  as  steady  state)  is  performed 
and  a  profile  of  ratios  for  corresponding  results  from  the 
alternative  systems  is  constructed.  In  some  cases,  a  ratio 
can  be  determined  analytically  to  be  less  than,  equal  to,  or 
greater  than  unity.  For  example,  if  the  ratio  of  values  for 
property  P  in  a  reference  system  to  the  same  property  in 
an  alternate  system  is  larger  than  unity,  then  the  reference 
system  can  always  be  made  to  have  a  larger  value  for 
P  no  matter  how  large  the  value  for  P  in  the  alternate 
system. 

However,  if  one  wishes  to  know  how  much  greater 
than  unity  a  given  ratio  is,  then  one  needs  to  examine 
actual  values  for  the  parameters.  These  parameter  values 
are  not  always  available  or,  if  available,  are  not  always 
accurate.  Moreover,  there  are  cases  in  which  the  ratio 
can  be  less  than  or  greater  than  unity  depending  upon 
the  specific  values  for  the  parameters.  In  any  case,  the 
results  of  such  a  numerical  comparison  are  no  longer 
general.  In  this  work  we  propose  a  novel  approach  to 
this  problem  that  combines  the  method  of  mathematically 
controlled  comparison  with  statistical  techniques  (Alves 
and  Savageau,  2000a, b)  to  yield  numerical  results  that  are 
general  in  a  statistical  sense. 

Although  we  could  describe  the  numerical  method 
in  general  terms,  this  approach  would  be  too  abstract 
and  difficult  to  understand.  Instead,  we  will  illustrate 
this  new  numerical  method  by  means  of  a  step-by-step 
application  using  a  very  simple  didactic  example.  We  also 
validate  the  results  by  comparison  with  the  corresponding 
results  obtained  using  the  previously  developed  analytical 
method.  The  analytical  approach  is  briefly  presented 
for  reference  purposes,  because  some  of  the  same  key 
concepts  are  needed  to  understand  the  numerical  method 
and  because  the  results  are  needed  for  comparison. 
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Fig.  1.  Three-step  unbranched  biosynthetic  pathways  with  in¬ 
hibitory  feedback.  The  metabolites  in  the  pathways  are  represented 
by  X i  to  X3,  and  their  concentrations  are  dependent  variables.  The 
initial  substrate  is  represented  by  Xo  and  the  modifier  of  the  demand 
for  end  product  is  represented  by  X4;  the  concentrations  of  these  lat¬ 
ter  metabolites  are  considered  independent  variables.  The  horizon¬ 
tal  arrows  represent  chemical  conversion,  whereas  the  vertical  ar¬ 
rows  represent  regulatory  influences.  This  pathway  can  be  viewed  as 
an  abstraction  of  biosynthetic  pathways  for  amino  acids.  (A)  Path¬ 
way  with  overall  feedback  inhibition  (reference  model).  (B)  Path¬ 
way  without  overall  feedback  inhibition  (alternative  model). 

Methods 

Alternative  models 

The  didactic  example  that  we  use  to  illustrate  our  numeri¬ 
cal  method  is  an  unbranched  three-step  pathway  as  shown 
schematically  in  Figure  1.  This  is  an  abstraction  from 
actual  three-step  biosynthetic  pathways  such  as  those 
involved  in  the  biosynthesis  of  amino  acids  (e.g.  http: 
//www.genome. ad.jp/kegg/dblinks/map/map01150.html). 
The  independent  variable  X4  represents  the  cell’s  demand 
for  the  end  product  X3.  If  the  cell  requires  large  amounts 
of  X3,  then  the  value  of  X4  will  be  high;  if  small  amounts 
of  X3  are  required,  then  the  value  of  X4  will  be  low. 
These  models  show  the  pathway  with  and  without  end- 
product  inhibition  (Umbarger,  1956;  Yates  and  Pardee, 
1956;  Monod  et  al,  1963),  a  common  feature  of  such 
pathways.  We  have  observed  (by  consulting  the  database 
at  http://wit.mcs.anl.g0v//EMP/)  that  there  is  usually  no 
other  feedback  to  the  first  step  of  the  pathway.  However, 
feedback  to  intermediate  reactions  may  exist  and  for  this 
reason  we  consider  models  with  all  possible  intermediary 
feedback  interactions. 

Differential  equations 

The  dynamic  behavior  of  each  model  can  be  described  by 
a  set  of  ordinary  differential  equations,  one  equation  per 
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intermediate.  This  set  of  equations  can  be  approximated  to 
the  first  order  in  logarithmic  space,  yielding  the  canonical 
form  for  the  local  S-system  representation  (Savageau, 
1969,  1996).  For  the  model  in  Figure  la  this  equation  set 
becomes: 

dXi/d t  =  aiX®,0Xf13  -  «2  ]”[  XT 

7= i 

dX2/dt  =  a2  f]  X?J  ~  «3  fl  XfJ  (1) 

7=1  7=2 

3 

dX3/d t  =  «3  n  Xf  -  ^X^Xf44 

7=2 

For  the  model  in  Figure  lb  the  equation  set  becomes: 

,  3 

dXl/dt  =  a[X8010-a2Y\xfj 

7=1 

3  3 

dx2/dt = «2  n  xT  -  «3  n  x?j  (2) 

7=1  7=2 

dX3/dt  =  a3  n  -  a4Xf43*r 

7=2 

The  multiplicative  parameters  (rate  constants),  a,  influ¬ 
ence  the  time  scales  of  the  reactions  and  are  always  pos¬ 
itive.  The  exponential  parameters  (kinetic  orders),  g,  rep¬ 
resent  the  influence  of  each  metabolite  on  each  aggregate 
rate  law.  If  X;  influences  the  aggregate  rate  law  V),  ei¬ 
ther  as  a  substrate  or  a  modulator,  and  if  an  increase  in 
the  concentration  of  X ,*  causes  an  increase  in  the  rate  Vj, 
then  the  kinetic  order  will  be  positive.  If  an  increase  in 
the  concentration  of  X/  causes  a  decrease  in  the  rate  V), 
then  the  kinetic  order  will  be  negative.  If  an  increase  in 
the  concentration  of  X/  causes  neither  an  increase  nor  a 
decrease  in  the  rate  V),  then  the  kinetic  order  will  be  zero. 
Thus,  the  positive  kinetic  orders  in  Equation  (1)  are  gi+ij 
(0  <  i  <  3),  which  are  the  kinetic  orders  for  substrates 
of  reactions,  and  #44,  which  is  a  scale  factor  arbitrarily  set 
equal  to  1.0.  The  remaining  kinetic  orders  are  negative, 
since  these  represent  negative  feedback  interactions. 

The  temporal  responsiveness  of  each  model  can  be 
determined  by  perturbing  the  system  variables,  solving 
the  corresponding  dynamic  equations,  and  calculating  the 
time  for  the  dependent  variables  to  settle  within  1%  of 
their  final  steady- state  values. 

Steady-state  solution  and  key  systemic  properties 

At  the  steady  state,  which  can  be  analytically  determined, 
both  the  production  and  consumption  terms  have  identical 


values.  One  can  write  the  following  matrix  equation 
(Savageau,  1969): 


by  —  gioTo 

an  a\2  a\3 

hi 

= 

a2y  a22  <223 

Y2 

b3  +  Y4  _ 

_u3i  a3  2  a33_ 

f3 

B  =  AY  (3) 

where  b,  =  ln(a, •+]/<*,•),  a,y  =  gyj  -  gy+yj  and 
Yi  =  ln(X,).  Equation  (3)  is  linear  and  therefore  easily 
solved  to  obtain  the  steady-state  values  for  each  Yy;  the 
corresponding  values  for  each  Xy  are  then  obtained  by 
simple  exponentiation. 

Two  types  of  coefficients,  logarithmic  gains  and  parame¬ 
ter  sensitivities,  can  be  used  to  characterize  the  steady  state 
of  such  models  (Savageau,  1971).  Logarithmic  gains  mea¬ 
sure  the  relative  influence  of  each  independent  variable  on 
each  dependent  variable  of  the  model.  For  example, 


L(Xy,X o) 


dLog(X,) 

dLog(Xo) 


dF 

dT0 


(4) 


measures  the  percent  change  in  the  concentration  of  inter¬ 
mediate  X/  caused  by  a  percentage  change  in  the  concen¬ 
tration  of  the  initial  substrate  Xo.  Logarithmic  gains  pro¬ 
vide  important  information  concerning  the  amplification 
or  attenuation  of  signals  as  they  are  propagated  through 
the  system. 

Parameter  sensitivities  measure  the  relative  influence  of 
each  parameter  on  each  dependent  variable  of  the  model. 
For  example, 


S(Xi,Pj ) 


dLog(X,)  _  d^ 
dLog(pj)  P]  dpj 


(5) 


measures  the  percent  change  in  the  concentration  of 
intermediate  X;  caused  by  a  percentage  change  in  the 
value  of  the  parameter  pj.  Parameter  sensitivities  provide 
important  information  about  system  robustness,  i.e.  how 
sensitive  the  system  is  to  perturbations  in  the  parameters 
that  define  the  structure  of  the  system. 

Since  steady-state  solutions  exist  in  closed  form  we  can 
calculate  each  of  the  two  types  of  coefficients  simply  by 
taking  the  appropriate  derivatives.  Although  the  mathe¬ 
matical  operations  involved  are  the  same  in  each  case,  it  is 
important  to  keep  in  mind  that  the  biological  significance 
of  the  two  types  of  coefficients  is  very  different. 

The  local  stability  of  the  steady  state  can  be  determined 
by  applying  the  Routh  criteria  (Dorf,  1992).  The  magni¬ 
tude  of  the  two  critical  Routh  conditions  can  be  used  to 
quantify  the  margin  of  stability  (Hlavacek  and  Savageau, 
1996). 
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Responsiveness 

Systems  should  respond  quickly  to  changes  in  their  envi¬ 
ronment.  To  evaluate  temporal  responsiveness,  perturba¬ 
tions  of  20%  were  made  in  the  steady-state  values  of  the 
intermediate  concentrations  and  the  time  required  for  them 
to  settle  within  1%  of  their  final  steady-state  values  was 
then  calculated.  This  also  gives  an  indication  of  the  tran¬ 
sit  time  for  metabolites  in  the  system.  These  transit  times 
should  be  low.  There  is  no  exact  way  to  determine  the  tran¬ 
sient  time  analytically.  Thus,  this  part  of  the  analysis  will 
be  dealt  with  only  in  the  numerical  section  of  the  results. 

Generation  of  random  ensembles 
The  analytical  results  give  qualitative  information  that 
characterizes  the  role  of  overall  feedback  in  the  system  of 
Figure  1A.  To  obtain  quantitative  information,  one  must 
introduce  specific  values  for  the  parameters  and  compare 
systems.  For  this  purpose  we  have  randomly  generated 
a  large  ensemble  of  parameter  sets  and  selected  5000 
of  these  sets  that  define  systems  consistent  with  various 
physical  and  biochemical  constraints.  These  constraints 
include  mass  balance,  low  concentrations  of  intermediates 
and  small  changes  in  their  value  to  minimize  the  utilization 
of  the  solvent  capacity  in  the  cell,  small  values  for 
parameter  sensitivities  so  as  to  desensitize  the  system  to 
spurious  fluctuations  affecting  its  structure,  and  stability 
margins  large  enough  to  ensure  local  stability  of  the 
systems.  A  detailed  description  of  these  methods  can  be 
found  in  Alves  and  Savageau  (2000b).  Mathematica™ 
(Wolfram,  1997)  was  used  for  all  numerical  procedures. 

Density  of  ratios  plot 

To  interpret  the  ratios  that  result  from  our  analysis  we  use 
Density  of  Ratios  plots  as  defined  in  Alves  and  Savageau 
(2000a).  The  primary  density  plots  from  the  raw  data  have 
the  magnitude  for  some  property  of  the  reference  model 
on  the  x-axis  and  the  corresponding  ratio  of  magnitudes 
(reference  model  to  alternative  model)  on  the  y-axis.  The 
primary  plot  can  be  viewed  as  a  list  of  5000  paired 
values  that  can  be  ordered  with  respect  to  the  reference 
magnitude  to  form  a  list  L\  in  which  the  first  pair  has 
the  lowest  measured  value  for  property  P  in  the  reference 
model,  the  second  has  the  second  lowest,  and  so  on. 
Secondary  density  plots  are  constructed  from  the  primary 
plots  by  the  use  of  moving  quantile  techniques  with  a 
window  size  of  500.  The  procedure  is  as  follows.  One 
collects  the  first  500  ratios  from  the  list  L\,  calculates  the 
quantile  of  interest  for  this  sample,  and  pairs  this  number 
(R),  with  the  median  value  of  the  corresponding  P  values 
of  the  reference  model  denoted  (P).  One  advances  the 
window  by  one  position,  collects  ratios  2  to  501 ,  calculates 
(R),  and  pairs  it  with  the  corresponding  <P)  value  and 
continues  in  this  manner  until  the  last  ratio  from  the  list 
L\  was  used  for  the  first  time  (for  further  explanation 


of  moving  median  techniques  see,  e.g.  Hamilton,  1994). 
The  slope  in  the  secondary  plot  measures  the  degree  of 
correlation  between  the  quantities  plotted  on  the  x-  and  y- 
axes.  This  technique  is  also  used  to  examine  correlations 
between  ratios  of  interest  and  other  magnitudes  shared 
by  the  two  systems,  e.g.  the  correlation  between  the  ratio 
of  stability  margins  and  the  magnitude  of  a  rate  constant 
common  to  the  two  systems  (for  traditional  applications  of 
correlation  analysis  see  Wherry,  1984). 

Analytical  comparison 

Firstly,  we  shall  exemplify  the  analytical  aspects  of  a 
mathematically  controlled  comparison  aimed  at  discov¬ 
ering  the  advantages,  if  any,  brought  about  by  overall 
feedback  inhibition.  This  will  serve  to  introduce  key 
concepts  that  will  be  needed  for  the  numerical  aspects  in 
the  following  section.  Also,  the  results  will  be  used  for 
later  comparison  to  validate  the  results  obtained  with  the 
new  numerical  method. 

We  compare  the  systemic  behavior  of  the  model  in 
Figure  1A  (reference  model)  with  that  in  Figure  IB 
(alternative  model).  To  ensure  that  the  results  are  due 
solely  to  the  differences  in  design  and  not  reversible  by 
a  mere  change  in  parameter  value,  we  shall  insist  on  the 
following  mathematical  controls. 

Internal  and  external  equivalence 
Only  the  first  step  in  the  pathway  is  allowed  to  differ 
between  the  reference  model  and  the  alternative.  There¬ 
fore,  to  establish  an  internal  equivalence  (Savageau,  1972, 
1976;  Irvine,  1991)  between  the  two  designs,  we  require 
the  values  for  the  corresponding  parameters  of  all  other 
steps  in  the  two  models  to  be  the  same. 

The  first  step  of  the  pathway  differs  between  the 
reference  model  and  the  alternative.  If  we  reason  that 
loss  or  gain  of  an  inhibitory  site  on  the  first  enzyme 
comes  about  by  mutation,  and  that  this  mutation  can 
cause  changes  in  all  the  parameters  of  the  process,  then 
(taking  the  model  in  Figure  1A  as  reference)  a  mutation 
causing  loss  of  overall  feedback  inhibition  would  change 
the  parameters  gio,  gi3  and  ai  in  Equation  (1)  to  g'10, 
g[3  =  0  and  a\  in  Equation  (2).  Since  we  wish  to 
determine  the  effects  that  are  due  solely  to  changes  in  the 
structure  of  the  system,  we  shall  specify  new  values  for 
g'10  and  c/j  that  minimize  all  other  effects.  This  can  be 
accomplished  by  deriving  the  mathematical  expression  for 
a  given  steady-state  property  in  each  of  the  two  models, 
equating  these  expressions,  and  then  solving  the  constraint 
equation  for  the  value  of  a  primed  parameter.  For  example, 
if  we  derive  expressions  for  the  logarithmic  gains  and 
require  that  L(Xt,  Xq)a  =  L(X,,  Xq)b,  then  this  equation 
can  be  solved  to  determine  the  following  value  for  g'10. 
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£10 — 


g!0g43 
g43  -  gl3 


(6) 


Similarly,  we  can  derive  expressions  for  the  steady-state 
concentrations  and  require  that  [X,]^  =  [X,]g,  then  this 
equation  can  be  solved  to  determine  the  following  value 
for  a  | . 


T  r  /-,  g43  Log[ai]  +  gi3  Log[/J3] 

LogLaj]  = - 

g43  -  #13 


(7) 


These  particular  values  for  the  primed  parameters  make 
the  steady-state  flux,  each  of  the  corresponding  steady- 
state  concentrations,  and  the  logarithmic  gains  in  each 
of  these  quantities  the  same  in  both  models.  The  process 
we  have  just  described  determines  the  maximal  degree  of 
external  equivalence  between  the  two  models.  There  are 
no  more  ‘free’  parameters  that  can  be  used  to  reduce  the 
differences  and  all  remaining  differences  can  be  attributed 
to  the  change  in  system  structure,  i.e.  to  overall  feedback 
inhibition.  The  external  equivalency  conditions  we  require 
insure  that  both  the  reference  and  the  alternative  models 
have  the  same  steady-state  concentrations  and  rates.  This 
implies  that  the  steady-state  thermodynamic  potential 
across  each  corresponding  reaction  is  the  same  in  the 
reference  and  the  alternative  models.  Having  established 
the  conditions  for  maximal  equivalence,  we  can  now 
analyze  the  two  models  and  determine  their  remaining 
differences. 


Pathway  gain 

The  logarithmic  gains  in  concentrations  and  fluxes  with 
respect  to  changes  in  the  initial  substrate  Xo  determine 
whether  the  pathway  is  amplifying  or  homeostatic.  When 
comparing  pathways  designed  to  amplify  biochemical  sig¬ 
nals  it  is  important  that  the  alternatives  provide  the  same 
high  logarithmic  gain.  Conversely,  when  comparing  path¬ 
ways  designed  to  attenuate  biochemical  signals  it  is  im¬ 
portant  that  the  alternatives  have  the  same  low  logarithmic 
gain.  The  method  of  mathematically  controlled  compar¬ 
ison  insures  that  both  the  compared  models  will  have  the 
same  logarithmic  gains  and  thus  have  the  same  amplifying 
or  homeostatic  characteristics. 

The  logarithmic  gains  in  concentration  with  respect 
to  changes  in  demand  (represented  here  by  changes  in 
the  modulator  X4)  are  smaller  in  the  reference  system, 
whereas  the  logarithmic  gain  in  flux  with  respect  to 
changes  in  X4  is  larger  in  the  reference  system.  In  this 
aspect,  the  reference  system  is  more  efficient  because 
it  can  produce  greater  increases  in  flux  with  smaller 
increases  in  concentration.  When  the  logarithmic  gain  in 
flux  with  respect  to  changes  in  demand  is  zero  (as  is  the 
case  in  the  alternative  model),  changes  in  demand  have  no 
influence  over  the  flux.  Thus,  overall  feedback  inhibition 
makes  the  system  better  equipped  to  deal  with  changes  in 
the  demand  for  X3.  These  results  are  shown  in  Table  1. 


Table  1.  Analytical  expressions  for  the  ratios  of  corresponding  systemic 
properties  of  the  reference  system  and  the  alternative  system 


Systemic 

property 

Dependent  variable  of  the  system 

*1 

*3 

V 

*0) 

1 

1 

1 

1 

L( X4) 

Ba 

C 

A 

1/0* 

A 

A 

A 

A 

S(-,a2) 

1 

1 

1 

1 

S(;  0(3) 

1 

1 

1 

1 

«-,«  4) 

B 

C 

A 

1/0 

$(■>£10) 

1 

1 

1 

1 

**(•>£13) 

-- 

-- 

-- 

-- 

S(’,£2l) 

1 

1 

1 

1 

S(',£22) 

1 

1 

1 

1 

S(-,g23) 

1 

1 

1 

1 

S(-,£32) 

1 

1 

1 

1 

£33) 

1 

1 

1 

1 

SO,  £43) 

B 

c 

A 

1/0 

a  The  three  critical  ratios  are  given  by  the  following  analytical 
expressions: 

£43  ~  £13 

5=1+  £l3t£22(£33  ~  £43)  +  £32 (£43  ~  £23)1  <  j 

(£13  ~  £43) (£23 £32  “  £22£33> 
c  _  1  +  £13 (£43  ~£33>  <  j 
£33 (£13  -£43> 

b  The  ratio  1/0  represents  the  division  of  any  non-zero  number  by 
zero. 


Robustness 

The  system  should  be  robust,  i.e.  insensitive  to  fluctuations 
in  the  parameter  values  (Shiraishi  and  Savageau,  1992). 
This  means  that  the  sensitivity  profile  should,  in  general, 
be  as  low  as  possible.  Whenever  the  sensitivity  of  a 
concentration  to  a  parameter  is  different  in  the  two 
models,  it  is  smaller  in  the  reference  model,  i.e.  the 
ratio  S(Xi ,  pj)A/S(Xi,  pj)s  is  always  less  than  or  equal 
to  unity.  Thus,  overall  feedback  inhibition  makes  each 
intermediate  concentration  less  sensitive  (i.e.  more  robust) 
with  respect  to  fluctuations  in  parameter  values. 

Most  of  the  corresponding  flux  sensitivities  are  equal 
in  the  reference  and  alternative  models.  The  sensitivity 
S(V,  a\)  is  smaller  in  the  reference  model,  which  makes 
this  model  less  sensitive  to  changes  in  the  molecular 
activity  of  the  first  enzyme,  whereas  the  sensitivities 
S(V,  g43)  and  S(V ,  #4)  are  larger  in  the  reference  system 
because  they  also  can  reflect  changes  in  demand.  These 
results  are  shown  in  Table  1 . 

Stability 

The  steady  state  of  the  system  should  be  stable,  i.e.  the 
system  should  return  to  its  original  steady  state  after  a 
small  perturbation.  If  this  does  not  occur,  the  system 
is  dysfunctional.  The  margins  of  stability  for  a  system 
can  be  measured  using  the  Routh  criteria  (Hlavacek  and 
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Savageau,  1996).  The  larger  these  margins,  the  further 
from  the  boundaries  of  instability  the  system  will  be.  The 
results  of  the  analysis  are  as  follows: 

Criterion#U  _  Criterion^  _  Criterion^  ^  n  ] 
Criterion#  1b  _  Criterion#2fi  _  Criterion#3B 

(8) 

where 

_ FjF2F3g\3g2\g32 _ 

F]Flglxgl2  -  FiF%g2lg22g32  } 

-F\F2F3g2\g23g32  +  F^F'ig22g22g22 

+F\Flg2\g\2  -  FlFigng\2 
-F\F%g\xg33  +  2F\F2Fjg2\g22g33 
-F%F3g22g33  -  2F\F2F3g2\g32g33 
+F%F3g22g32g33  +  F2F$g23g32g33 

+  F1F*g2lg33  -  F2F%g22g33 
+F2F3g2jg43  -  2FiF2F3g2lg22g43 
+F%F3g22g43  +  2F\F2F3g2\g32g43 
— 2F2 F3g22g32g43  -  F2Fjg23g32g43 
+F%F3g32g43  ~  2F\F%g2\g33g43 
+2F2F32g22g33g43  -  F2F^g32g33g43 

+F]  F32g21  gj3  -  F2F32g22,?43 

+F2F32g32g43  J 

with  F,  being  the  turn-over  number  of  the  pool  X,  . 
Note  that  the  negative  signs  in  this  expression  always 
precede  parameters  that  represent  negative  feedback  and 
consequently  all  terms  have  positive  values.  Thus,  the 
reference  and  the  alternative  models  differ  in  only  one 
of  the  three  Routh  criteria  applicable  for  a  three  variable 
system,  and  the  alternative  system  has  the  larger  margin  of 
stability. 

Summary 

The  analytical  comparison  gives  qualitative  results  that 
characterize  the  role  of  overall  feedback  inhibition  in  the 
model  of  Figure  1A.  This  analysis  demonstrates  that  the 
model  with  overall  feedback  inhibition  is  more  robust  and 
that  its  flux  is  more  responsive  to  changes  in  demand 
for  the  end  product,  although  this  model  has  a  smaller 
margin  of  stability.  However,  this  analysis  does  not  tell 
us  how  much  more  robust  or  how  much  more  responsive 
to  demand  the  reference  model  is,  nor  does  it  tell  us  how 
much  smaller  its  margin  of  stability  is.  For  answers  to 
these  questions  we  must  consider  specific  values  for  the 
parameters  and  employ  statistical  techniques  if  we  are  to 
uncover  general  tendencies. 


Numerical  comparison 

The  techniques  described  in  Alves  and  Savageau  (2000b) 
have  been  used  to  generate  an  ensemble  of  5000  parameter 
sets  that  characterize  and  reference  the  alternative  systems 
with  stable  steady  states.  Each  of  these  parameter  sets  was 
then  inserted  into  the  appropriate  equations  to  determine 
the  magnitude  of  the  quantitative  differences  between 
reference  and  alternative  systems. 

Ratios  of  systemic  properties 

Figure  2A  shows  a  typical  Density  of  Ratios  plot  for  an 
individual  parameter  sensitivity.  One  can  clearly  see  that 
S(Xi,  pj)A/S(Xi ,  pj)b  <  1.  Figure  2C  shows  a  typical 
example  of  a  Density  of  Ratios  plot  for  the  aggregate 
parameter  sensitivities  of  a  concentration  variable.  The 
aggregate  parameter  sensitivity  of  Xi,  S(X(),  is  defined 
as  the  Euclidean  norm  of  the  vector  whose  coordinates  are 
the  sensitivities  with  respect  to  the  individual  parameters. 
[The  numerical  method  makes  it  possible  to  use  different 
functions  of  the  parameter  sensitivities  to  define  an 
aggregate  sensitivity,  e.g.  a  weighted  average  of  the 
sensitivities  could  be  used  when  one  knows  the  relative 
importance  of  the  individual  parameters  in  the  model.] 
The  ratio  is  defined  as  the  aggregate  parameter  sensitivity 
in  the  reference  model  divided  by  the  corresponding 
aggregate  in  the  alternative  model.  Again,  we  see  that  the 
reference  model  has  smaller  sensitivities. 

A  comparison  of  the  models  on  the  basis  of  the  3rd 
Routh  criterion  for  stability  (Figure  2E)  shows  that  the 
margin  of  stability  is  smaller  for  the  reference  model; 
however,  the  magnitude  of  the  difference  is  very  small 
with  ratios  always  greater  than  0.81.  The  modal  class 
of  this  ratio  is  the  one  closest  to  1  (defined  has  0.995 
<  ratio  <1),  with  more  than  35%  of  the  models.  Thus, 
models  with  or  without  overall  feedback  inhibition  have 
very  similar  stability  boundaries.  This  indicates  that 
local  stability  is  probably  not  an  important  criterion  in 
comparing  the  models,  since  they  are  very  similar  in 
this  aspect.  Figure  2G  shows  that  the  model  with  overall 
feedback  inhibition  is  typically  more  responsive  than  the 
alternative  model  lacking  this  inhibition,  although  there 
are  a  few  exceptions. 

Statistical  analysis  of  ratios 

Figures  2B,  D,  F  and  2H  show  the  moving  median  plots 
(Alves  and  Savageau,  2000a)  corresponding  to  the  raw 
Density  of  Ratios  plots  in  Figures  2A,  C,  E  and  2G.  As 
was  mentioned  previously,  robust  systems  function  more 
reproducibly.  Figure  2B  shows  an  example  of  a  moving 
median  plot  for  an  individual  parameter  sensitivity.  There 
are  two  regions  in  which  there  is  no  correlation  between 
the  sensitivity  in  the  reference  model  and  the  ratio  of 
sensitivities  in  the  reference  and  alternative  models.  These 
two  regions  are  separated  by  a  region  with  a  sharp  change 
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<  3rd  Routh  Condition  > 

H 


Transient  Time  <  f  ransient  Time  > 


Fig.  2.  Density  of  ratios  plots  for  different  systemic  properties  of  the  reference  system  vs  the  ratio  of  their  values  in  the  reference  system  to 
the  corresponding  value  in  the  alternative  system.  Density  of  ratios  plots  for  the  primary  data  are  shown  in  the  left-hand  panels  (A,  C,  E  and 
G)  the  corresponding  moving  median  plots  are  shown  in  the  right-hand  panels  (B,  D,  F  and  H).  (A)  and  (B)  a  typical  individual  sensitivity, 
(C)  and  (D)  a  typical  aggregate  sensitivity  (The  aggregate  sensitivity  of  any  given  metabolite  is  defined  in  this  case  as  the  Euclidean  norm  of 
a  vector  whose  components  are  given  by  the  sensitivity  of  the  relevant  metabolite  to  each  of  the  parameters),  (E)  and  (F)  the  Routh  condition 
that  differs  between  the  reference  and  the  alternative  system,  (G)  and  (H)  the  transient  time. 

in  the  average  value  of  the  ratio.  For  most  other  parameter  (Figure  2D)  shows  that  as  the  sensitivity  increases  (i.e.  the 
sensitivities,  the  ratio  changes  less  abruptly.  The  moving  robustness  decreases)  the  ratio  also  tends  to  increase,  until 

median  plot  for  the  aggregate  parameter  sensitivities  it  reaches  a  limit  median  value.  For  highly  robust  models. 
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0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0 
3rd  Routh  Condition 
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Ratio 
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Fig.  3.  Histograms  comparing  properties  that  differ  between  the  reference  system  and  the  alternative  system  in  Figure  1.  In  the  left-hand 
panels  (A,  C,  E,  G  and  I)  the  histogram  of  the  relevant  property  for  the  reference  system  is  represented  by  a  thick  line  whereas  the  same 
histogram  for  the  alternative  system  is  represented  by  a  thin  line.  In  the  right-hand  panels  (B,  D,  F,  H  and  J)  the  histogram  for  the  ratio 
is  represented  with  a  thick  line.  (A)  and  (B)  aggregate  sensitivity  of  Xi,  (C)  and  (D)  aggregate  sensitivity  of  X2,  (E)  and  (F)  aggregate 
sensitivity  of  X3,  (G)  and  (H)  3rd  Routh  condition,  (I)  and  (J)  transient  time. 
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the  difference  in  robustness  between  the  reference  and  the 
alternative  model  tends  to  be  bigger.  Thus,  for  models 
that  are  optimized  with  regard  to  robustness,  on  average, 
the  reference  model  will  be  much  more  robust  than  the 
alternative  model. 

Figure  2F  shows  that  the  stability  margin  of  the  alter¬ 
native  model  is  always  greater  than  that  of  the  reference 
model,  although  on  average  the  differences  are  insignifi¬ 
cant.  Hence,  the  stability  margins  are  essentially  the  same. 

As  for  the  transient  behavior  of  the  models  (Figure  2H), 
we  can  see  that  the  responsiveness  of  the  reference  model 
is  almost  always  better  than  or  equal  to  that  of  the 
alternative  model  (more  than  98%  of  all  cases).  Overall 
feedback  inhibition  has  an  important  effect  in  making  the 
model  respond  more  quickly  to  perturbations  in  its  state. 
Since  there  is  no  analytical  expression  for  the  transient 
behavior,  the  only  way  to  obtain  these  comparative  results 
is  through  the  use  of  numerical  methods. 

A  different  way  to  observe  that  the  parameter  sensitivi¬ 
ties  are  indeed  larger  in  the  alternative  model,  is  by  com¬ 
paring  histograms  of  corresponding  sensitivities  that  dif¬ 
fer  between  the  reference  and  the  alternative  model  and 
by  plotting  the  histograms  of  the  ratios  directly  (Figure  3). 
The  alternative  model  clearly  has  more  parameter  sensi¬ 
tivities  in  the  higher  range  of  values.  This  approach  also 
shows  that  the  transient  times  are  longer  for  the  alternative 
model,  whereas  the  difference  in  distributions  for  the  sta¬ 
bility  margin  is  less  notable.  In  each  case,  the  histogram 
of  ratios  shows  that  the  magnitudes  are  larger  in  the  alter¬ 
native  model. 

Correlations 

The  previous  paper  (Alves  and  Savageau,  2000b)  has 
shown  how  different  properties  of  the  model  represented 
in  Figure  1A  are  correlated.  Here  we  use  the  same 
technique  to  show  how  the  differences  between  reference 
(Figure  1A)  and  alternative  (Figure  IB)  models  are  corre¬ 
lated  with  various  steady-state  properties.  The  differences 
we  shall  examine  are  the  four  analytical  determined  ratios 
shown  in  Table  1  (A-D)  plus  the  ratio  of  transient  times 
that  we  determine  numerically  ( E ).  For  each  of  the  five 
ratios  we  plot  {R)q{  as  a  function  of  (P),  where  (R)q{  is 
the  ith  moving  quantile  of  ratio  R  and  (P)  is  the  moving 
median  of  the  steady-state  property  of  interest.  We  present 
results  for  i  =  0.05,  i  —  0.5  and  i  =  0.95.  The  moving 
window  size  used  in  the  calculations  is  500.  The  generic 
shapes  of  the  correlation  curves  are  shown  in  Figure  4, 
and  the  results  of  the  correlation  analysis  are  summarized 
with  reference  to  these  shapes  in  Table  2. 

Each  moving  quantile  curve  for  the  same  R ,  {R}q(, 
represents  a  contour  that  shows  how  a  given  quantile  of 
R  is  correlated  with  a  particular  magnitude  of  interest.  By 
building  a  contour  plot  with  several  different  quantiles, 
we  can  empirically  evaluate  the  quality  of  the  predicted 


Fig.  4.  Qualitatively  different  shapes  for  the  correlation  curves 
between  different  systemic  properties.  The  correlation  is  determined 
by  a  plot  of  the  moving  median  for  one  property  versus  the  moving 
median  for  another  constructed  from  an  ensemble  of  systems 
selected  on  the  basis  of  various  behavioral  classes  (see  Alves  and 
Savageau,  2000b).  The  nine  shapes  (Cl  through  C9)  include  all  the 
tabulated  shapes  found  by  examination  of  the  actual  graphs.  These 
shapes  are  referenced  in  Table  2. 


correlation,  as  well  as  obtain  non-parametric  confidence 
interval  curves  for  the  moving  median. 

An  example  of  such  a  contour  plot  is  presented  in 
Figure  5  for  the  correlation  between  the  different  moving 
quantiles  of  the  ratio  B  (from  Table  1)  and  the  2nd  Routh 
condition  for  local  stability.  The  plot  gives  information 
about  the  dispersion  of  B  as  a  function  of  the  2nd  Routh 
criterion.  This  dispersion  decreases  as  the  value  of  the 
Routh  criterion  increases.  At  low  values  of  the  Routh 
criterion  the  5%  quantile  of  B  is  very  close  to  —1  and 
the  95%  quantile  is  very  close  to  1,  whereas  for  high 
values  of  the  stability  margin  the  5%  quantile  is  about 
—0.6  and  the  95%  quantile  is  about  0.7.  In  plots  involving 
other  quantities,  the  dispersion  may  increase  or  remain 
unchanged  as  the  quantity  on  the  ;c-axis  increases. 

The  second  type  of  information  one  can  extract  from 
Figure  5  regards  the  quality  of  the  predicted  correlation 
between  B  and  the  2nd  Routh  criterion.  From  the  moving 
quantile  plot  involving  <2o.5  we  determine  that,  on  aver¬ 
age,  there  is  no  correlation  between  B  and  the  value  for 
the  2nd  Routh  criterion.  The  other  moving  quantile  curves 
show  that,  for  high  values  of  the  stability  margin,  this  ab¬ 
sence  of  correlation  is  maintained  for  all  moving  quan¬ 
tiles.  However,  in  the  region  of  low  values  for  the  stability 
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Table  2.  Correlation  between  the  five  critical  ratios  and  various  systemic  properties 


Systemic  _ _ _ Critical  ratios* _ _ _ 

property  _ A _ B _  C _ D _ E _ 

00.05  00.5  00.95  00.05  00.5  00.95  00.05  00.5  00.95  00.05  00.5  00.95  00.05  00.5  00.95 
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a  The  expressions  for  the  critical  ratios  A,  B,  and  C  are  given  in  the  footnote  of  Table  1.  The  expression  for  ratio  D  is  given  following  Equation  (8)  in  the 
text.  Ratio  E  is  the  ratio  of  transient  times,  which  are  determined  numerically. 

b  The  q  vaiues  refer  to  the  shape  of  the  curves  in  Figure  5.  We  present  the  shape  of  the  curves  with  a  90%  confidence  interval.  For  example,  the  correlation  of 
ratio  B  with  L{X\ ,  X4)  has  a  shape  C2  but  its  90%  boundaries  show  that  this  form  can  change  (smoothly)  between  C9  and  C4. 


margin,  a  positive  correlation  between  the  stability  mar¬ 
gin  and  B  starts  to  develop  as  the  quantile  of  B  decreases 
from  £ 0.5  to  00.05  ♦  Symmetrically,  a  negative  correlation 
develops  as  the  quantile  of  B  increases  from  0o.5  to  00.95- 
As  Qi  tends  to  00.0  or  to  0 1.0  these  correlations  tend  to  be 
more  pronounced.  One  interpretation  is  that,  for  low  val¬ 
ues  of  the  stability  margins,  there  is  a  larger  uncertainty 
about  the  correlation  between  B  and  the  stability  margins. 

Correlations  among  the  four  analytical  determined  ratios 
(A-C  in  Table  1  and  D  in  equation  (8))  plus  the  ratio 
of  transient  times  that  we  determine  numerically  ( E )  are 
shown  in  Figure  6.  It  can  be  seen  that  the  ratios  A,  D  and 
E  are  directly  correlated.  This  means  that  systems  with 
high  values  (i.e.  close  to  1)  for  A  will  also,  on  average, 
have  high  values  for  D  and  E  (i.e.  close  to  1).  Similarly, 
the  ratios  B  and  C  are  directly  correlated.  On  the  other 
hand,  the  values  of  ratios  B  and  C  change  from  negatively 
to  positively  correlated  with  the  other  three  ratios  as  the 
values  of  these  other  three  ratios  increases. 

Summary 

The  numerical  method  reproduces  the  qualitative  results 
that  are  obtained  analytically  as  should  be  expected.  Fur¬ 
thermore,  the  numerical  comparison  extends  the  analytical 
results  by  providing  quantitative  results.  For  example, 
overall  feedback  inhibition  decreases  the  stability  margins 


of  the  steady  state,  which  was  shown  quantitatively  to  be 
on  average  a  minimal  effect,  and  increases  the  robustness 
of  the  system,  which  was  shown  quantitatively  to  be  a 
highly  significant  effect.  The  numerical  approach  also 
provides  a  way  to  compare  the  temporal  responsiveness 
of  the  alternative  models  following  perturbations  in 
the  steady-state  concentrations.  For  our  model  systems 
we  found  that  overall  feedback  inhibition  significantly 
decreases  the  response  time  of  the  reference  system  (with 
overall  feedback),  compared  to  that  of  the  alternative 
system  (without  overall  feedback).  Finally,  we  determined 
how  the  different  ratios  are  correlated  with  systemic  prop¬ 
erties  and  with  parameters  of  interest,  and  we  presented 
a  way  to  determine  the  confidence  one  should  place  on 
these  correlations.  Thus,  the  numerical  approach  has 
significantly  extended  the  scope  of  application  beyond 
that  of  the  analytical  approach. 

Discussion 

The  method  of  mathematically  controlled  comparison 
has  been  used  since  the  early  1970s  as  a  powerful  tool 
to  characterize  alternative  designs  for  several  classes 
of  biological  systems.  In  each  case,  this  comparative 
technique  has  provided  insight  into  the  natural  selection 
of  the  various  designs.  The  results  obtained  in  some  cases 
are  independent  of  specific  parameter  values. 
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Fig.  5.  Example  of  the  use  of  moving  quantiles  to  establish  the  level  of  confidence  in  the  correlation  between  magnitudes  of  interest.  The  plot 
represents  the  correlation  of  different  moving  quantiles  of  the  critical  ratio  B  (defined  in  Table  1)  as  a  function  of  the  2nd  Routh  condition. 
The  thickest  line  represents  the  moving  median  of  B .  The  first  lines  above  and  below  the  moving  median  represent  quantiles  0.65  and  0.35 
respectively.  The  progression  of  lines  above  and  below  represent  quantiles  that  decrease  and  increase,  respectively,  by  intervals  of  0.10. 


In  many  other  cases,  the  results  of  the  analysis  will  de-J 
pend  on  the  numerical  values  of  the  parameters.  However J 
if  the  range  of  values  for  the  parameters  is  known  and 
their  influence  does  not  change  abruptly  over  the  range 
of  interest,  then  random  sampling  can  be  used  effectively 
to  make  numerical  comparisons  without  exact  knowledge 
of  the  parameter  values.  If  one  knows  the  distribution  of 
values  for  each  parameter,  then  one  can  generate  random 
numbers  with  the  appropriate  distributions  in  order  to  ob¬ 
tain  a  large  set  of  parameters  and  statistically  study  the 
differences  between  various  designs.  More  often  than  not 
these  distributions  are  unknown,  because  there  are  enor¬ 
mous  numbers  of  components  and  interactions  that  need  to 
be  identified.  In  the  absence  of  a  priori  knowledge  about 
the  distributions  for  the  parameter  values,  we  have  gener¬ 
ated  random  numbers  with  a  uniform  distribution  and  then 
refined  the  distributions  by  accepting  vectors  of  parame¬ 
ter  values  only  if  they  create  a  model  that  has  the  behav¬ 
ioral  characteristics  of  interest  e.g.  in  this  paper,  models 
with  stable  steady  states;  (see  Alves  and  Savageau  (2000b) 
for  an  analysis  of  models  belonging  to  different  behavioral 
classes). 


Analytical  comparisons  in  this  paper  demonstrate  that 
the  reference  model  is  more  robust  and  has  smaller 
stability  margins  than  the  alternative  model.  They  also 
show  that  the  flux  and  concentrations  in  the  reference 
model  are  less  sensitive  to  changes  in  demand  for  end 
product.  However,  analytical  comparison  can  not  give  us 
any  qualitative  information  about  the  relative  transient 
times  of  the  two  models,  nor  can  it  tell  us  anything 
quantitative  about  the  magnitude  of  the  differences  in 
transient  times  between  the  two  models. 

The  method  of  numerical  comparison  provides  infor¬ 
mation  about  the  alternative  designs  that  could  not  have 
been  obtained  by  exclusive  use  of  analytical  comparisons. 
It  shows  that  the  relative  differences  in  parameter  sensi¬ 
tivities  and  transient  times  between  the  reference  and  the 
alternative  models  are,  on  average,  much  larger  than  those 
between  stability  margins.  This  implies  that  differences 
in  stability  margins  are  not  very  relevant  for  the  selection 
of  overall  feedback  inhibition.  Moreover,  this  approach 
shows  that  more  than  99%  of  all  reference  models  have 
faster  transient  responses  than  the  corresponding  alterna¬ 
tive  models.  This  reinforces  the  idea  that  overall  feedback 
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Fig.  6.  Moving  median  plots  that  reveal  the  correlations  among  the  five  critical  ratios  (A-E)  obtained  while  comparing  the  systems  in 
Figure  1. 


inhibition  improves  the  function  of  these  biosynthetic 
pathways. 

With  the  method  of  numerical  comparison  we  also  show 
that,  among  the  five  critical  ratios  that  characterize  the 
alternative  designs  in  Figure  1,  there  are  two  groups  within 
which  the  ratios  are  directly  correlated  with  each  other. 
One  group  includes  the  ratios  A,  D  and  E;  the  other 
includes  the  ratios  B  and  C.  Members  of  the  second  group 
are  negatively  correlated  with  members  of  the  first  group 
at  low  values  and  positively  correlated  at  high  values. 

The  introduction  of  contour  density  of  ratios  plots  (i.e. 
plots  having  different  moving  quantiles  for  the  y-axis) 
provides  a  measure  for  the  uncertainty  in  the  correlations 
between  ratios  and  systemic  properties  of  interest.  In  most 
of  the  cases  analyzed  in  this  paper  the  correlation  holds 
with  a  90%  confidence  interval  (i.e.  the  correlation  is 
always  positive,  negative  or  null  no  matter  what  quantile  is 
used,  as  can  be  seen  in  Table  2).  However,  in  some  cases, 
such  as  that  in  Figure  5,  there  is  more  uncertainty  about 
the  correlations.  Although  the  nature  of  the  correlations 
will  be  model  and  behavioral-class  dependent,  there  are 


some  properties  of  these  contour  plots  that  are  general  (see 

Appendix). 

Thus,  the  method  of  numerical  comparison  presented  in 
this  paper  allows  one  to  quantify,  and  in  some  cases  to 
eliminate,  the  uncertainties  associated  with  the  analytical 
approach  to  mathematically  controlled  comparison.  This 
generalization  allows  one  to  obtain  more  information  from 
the  comparison.  It  also  allows  one  to  focus  the  comparison 
on  systems  that  are  considered  most  appropriate  for  each 
design,  simply  by  selecting  from  randomly  generated 
parameter  values  ensembles  of  parameter  sets  that  give 
rise  to  systems  that  are  considered  appropriate  with  respect 
to  properties  of  interest.  This  provides  a  means  to  make  the 
comparison  more  significant. 
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Appendix 

Consider  a  contour  density  of  ratios  plot  of  property  Pi 
versus  property  P2 ,  such  as  the  one  presented  in  Figure  5. 
If  {P\)Qk  represents  the  Qk  moving  quantile  curve  for 
property  Pi,  {P\)Qm  represents  the  Qm  moving  quantile 
curve  for  the  same  property,  k  <  m,  and  the  same 
window  size  is  used  to  calculated  the  two  curves,  then 
contour  density  of  ratios  plots  have  the  following  generic 
properties: 

(1)  For  any  given  value  of  (P2)  on  the  x-axis,  the  curve 

{P\)Qk  <  (P\)  Qm  • 

(2)  The  shape  of  the  curve  for  <Pi)gr,  with  k  <  r  <  m, 
can  only  change  progressively  between  the  format 
of  the  curve  {P\)Qk  and  the  format  of  the  curve 

{  P\  >  Qm  * 

The  proof  for  the  first  property  comes  from  the  fact 
that,  for  the  same  value  of  {P2),  { P\)Qk  and  ( P\)Qm  are 
different  quantiles  of  the  same  sample.  Thus,  if  k  <  m 
then  Qk<Qm  • 

The  proof  of  the  second  property  is  also  very  simple. 
From  Property  1  we  know  that  {P\)Qk  <  (■ P\)Qm ■  Thus, 
(Pi)Qk  <  {P\)Qr  <  { P\)Qm •  At  each  value  of  (P2),  the 
maximum  number  of  different  quantiles  is  W,  which  is  the 
window  size  associated  with  the  total  sample  of  size  S.  Let 
Qi/w  represent  the  quantile  i/W,  where  i  can  vary  from  1 
to  W.  As  \/W  ->•  0,  such  that  the  ratio  W/S  -*■  constant, 
(Pi)  Qi/w  -*■  {P\)Q(i+\)/w-  Thus,  since  (Pi) e(/-i)/w  < 
{Pi) Qi/w  5  {P\) Q(i+\)/w i  the  shape  of  the  curves  will 
change  progressively  from  (Pi)go.o  to  (P\)q\.q. 
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The  control  of  gene  expression  involves  complex  circuits  that  exhibit  enormous  variation  in  design. 
For  years  the  most  convenient  explanation  for  these  variations  was  historical  accident.  According  to 
this  view,  evolution  is  a  haphazard  process  in  which  many  different  designs  are  generated  by 
chance;  there  are  many  ways  to  accomplish  the  same  thing,  and  so  no  further  meaning  can  be 
attached  to  such  different  but  equivalent  designs.  In  recent  years  a  more  satisfying  explanation  based 
on  design  principles  has  been  found  for  at  least  certain  aspects  of  gene  circuitry.  By  design  principle 
we  mean  a  rule  that  characterizes  some  biological  feature  exhibited  by  a  class  of  systems  such  that 
discovery  of  the  rule  allows  one  not  only  to  understand  known  instances  but  also  to  predict  new 
instances  within  the  class.  The  central  importance  of  gene  regulation  in  modem  molecular  biology 
provides  strong  motivation  to  search  for  more  of  these  underlying  design  principles.  The  search  is 
in  its  infancy  and  there  are  undoubtedly  many  design  principles  that  remain  to  be  discovered.  The 
focus  of  this  three-part  review  will  be  the  class  of  elementary  gene  circuits  in  bacteria.  The  first  part 
reviews  several  elements  of  design  that  enter  into  the  characterization  of  elementary  gene  circuits  in 
prokaryotic  organisms.  Each  of  these  elements  exhibits  a  variety  of  realizations  whose  meaning  is 
generally  unclear.  The  second  part  reviews  mathematical  methods  used  to  represent,  analyze,  and 
compare  alternative  designs.  Emphasis  is  placed  on  particular  methods  that  have  been  used 
successfully  to  identify  design  principles  for  elementary  gene  circuits.  The  third  part  reviews  four 
design  principles  that  make  specific  predictions  regarding  (1)  two  alternative  modes  of  gene  control, 
(2)  three  patterns  of  coupling  gene  expression  in  elementary  circuits,  (3)  two  types  of  switches  in 
inducible  gene  circuits,  and  (4)  the  realizability  of  alternative  gene  circuits  and  their  response  to 
phased  environmental  cues.  In  each  case,  the  predictions  are  supported  by  experimental  evidence. 
These  results  are  important  for  understanding  the  function,  design,  and  evolution  of  elementary  gene 
circuits.  ©  2001  American  Institute  of  Physics.  [DOI:  10.1063/1.1349892] 


Gene  circuits  sense  their  environmental  context  and  or¬ 
chestrate  the  expression  of  a  set  of  genes  to  produce  ap¬ 
propriate  patterns  of  cellular  response.  The  importance 
of  this  role  has  made  the  experimental  study  of  gene 
regulation  central  to  nearly  all  areas  of  modern  molecu¬ 
lar  biology.  The  fruits  of  several  decades  of  intensive  in¬ 
vestigation  have  been  the  discovery  of  a  plethora  of  both 
molecular  mechanisms  and  circuitry  by  which  these  are 
interconnected.  Despite  this  impressive  progress  we  are 
at  a  loss  to  understand  the  integrated  behavior  of  most 
gene  circuits.  Our  understanding  is  still  fragmentary  and 
descriptive;  we  know  little  of  the  underlying  design  prin¬ 
ciples.  Several  elements  of  design,  each  exhibiting  a  vari¬ 
ety  of  realizations,  have  been  identified  among  elemen¬ 
tary  gene  circuits  in  prokaryotic  organisms.  The  use  of 
well-controlled  mathematical  comparisons  has  revealed 
design  principles  that  appear  to  govern  the  realization  of 
these  elements.  These  design  principles,  which  make  spe¬ 
cific  predictions  supported  by  experimental  data,  are  im¬ 
portant  for  understanding  the  normal  function  of  gene 
circuits;  they  also  are  potentially  important  for  develop¬ 
ing  judicious  methods  to  redirect  normal  expression  for 
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biotechnological  purposes  or  to  correct  pathological  ex¬ 
pression  for  therapeutic  purposes. 

I.  INTRODUCTION 

The  gene  circuitry  of  an  organism  connects  its  gene  set 
(genome)  to  its  patterns  of  phenotypic  expression.  The  geno¬ 
type  is  determined  by  the  information  encoded  in  the  DNA 
sequence,  the  phenotype  is  determined  by  the  context- 
dependent  expression  of  the  genome,  and  the  circuitry  inter¬ 
prets  the  context  and  orchestrates  the  patterns  of  expression. 
From  this  perspective  it  is  clear  that  gene  circuitry  is  at  the 
heart  of  modem  molecular  biology.  However,  the  situation  is 
considerably  more  complex  than  this  simple  overview  would 
suggest.  Experimental  studies  of  specific  gene  systems  by 
molecular  biologists  have  revealed  an  immense  variety  of 
molecular  mechanisms  that  are  combined  into  complex  gene 
circuits,  and  the  patterns  of  gene  expression  observed  in  re¬ 
sponse  to  environmental  and  developmental  signals  are 
equally  diverse. 

The  enormous  variety  of  mechanisms  and  circuitry 
raises  questions  about  the  bases  for  this  diversity.  Are  these 
variations  in  design  the  result  of  historical  accident  or  have 
they  been  selected  for  specific  functional  reasons?  Are  there 
design  principles  that  can  be  discovered?  By  design  principle 
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FIG.  1.  Schematic  diagram  of  a  bacterial  transcription  unit.  The  structure  of 
the  unit  consists  of  two  genes  (Gj  and  G2),  bounded  by  a  promoter  sequence 
(P)  and  a  terminator  sequence  (T),  and  preceded  by  upstream  modulator 
sites  (M[  and  M2)  that  bind  regulators  capable  of  altering  transcription  ini¬ 
tiation.  The  solid  arrow  represents  the  mRNA  transcript. 

we  mean  a  rule  that  characterizes  some  biological  feature 
exhibited  by  a  class  of  systems  such  that  discovery  of  the 
rule  allows  one  not  only  to  understand  known  instances  but 
also  to  predict  new  instances  within  the  class.  For  many 
years,  most  molecular  biologists  assumed  that  accident 
played  the  dominant  role,  and  the  search  for  rules  received 
little  attention.  More  recently,  simple  rules  have  been  iden¬ 
tified  for  a  few  variations  in  design.  Accident  and  rule  both 
have  a  role  in  evolution  and  their  interplay  has  become 
clearer  in  these  well-studied  cases.  This  area  of  investigation 
is  in  its  infancy  and  many  such  questions  remain  unan¬ 
swered. 

This  review  article  addresses  the  search  for  design  prin¬ 
ciples  among  elementary  gene  circuits.  It  reviews  first  sev¬ 
eral  elements  of  design  for  gene  circuits,  then  mathematical 
methods  used  to  study  variations  in  design,  and  finally  ex¬ 
amples  of  design  principles  that  have  been  discovered  for 
elementary  gene  circuits  in  prokaryotes. 

II.  ELEMENTS  OF  DESIGN  AND  THE  NEED  FOR 
DESIGN  PRINCIPLES 

The  behavior  of  an  intact  biological  system  can  seldom 
be  related  directly  to  its  underlying  genome.  There  are  sev¬ 
eral  different  levels  of  hierarchical  organization  that  inter¬ 
vene  between  the  genotype  and  the  phenotype.  These  levels 
are  linked  by  gene  circuits  that  can  be  characterized  in  terms 
of  the  following  elements  of  design:  transcription  unit,  input 
signaling,  mode  of  control,  logic  unit,  expression  cascade, 
and  connectivity.  Each  of  these  elements  exhibits  a  variety  of 
realizations  whose  basis  is  poorly  understood. 

A.  Transcription  unit 

A  landmark  in  our  understanding  of  gene  circuitry  was 
the  discovery  by  Jacob  and  Monod  of  the  operon,1  the  sim¬ 
plest  of  transcription  units.  This  unit  of  sequence  organiza¬ 
tion  consists  of  a  set  of  coordinately  regulated  structural 
genes  (e.g.,  Gj  and  G2  in  Fig.  1)  that  encode  proteins,  an 
up-stream  promoter  site  (P)  at  which  transcription  of  the 
genes  is  initiated,  and  a  down-stream  terminator  site  (T)  at 
which  transcription  ceases.  Modulator  sites  (e.g.,  Mj  and  M2 
in  Fig.  1)  associated  with  the  promoter  bind  regulatory  pro¬ 
teins  that  influence  the  rate  of  transcription  initiation  (opera¬ 
tor  sites  bind  regressors  that  down-regulate  high-level  pro¬ 
moters,  or  initiator  sites  bind  activators  that  up-regulate  low- 
level  promoters). 

Transcription  units  are  the  principal  feature  around 
which  gene  circuits  are  organized.  On  the  input  side,  signals 
in  the  extracellular  (or  intracellular)  environment  are  de- 
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FIG.  2.  Input  signals  for  transcription  units  can  arise  either  from  the  extra¬ 
cellular  environment  or  from  within  the  cell.  S  is  a  stimulus,  Rec  and  Rec* 
are  the  inactive  and  active  forms  of  the  receptor,  and  Reg  and  Reg*  are  the 
inactive  and  active  forms  of  the  regulator,  (a)  Signal  transduction  from  the 
extracellular  environment  to  an  intracellular  transcription  unit  via  a  two- 
component  system.  (B)  The  extracellular  signal  molecule  is  transported  into 
the  cell  where  it  interacts  directly  with  the  regulator  of  a  transcription  unit, 
(c)  The  signal  molecule  is  transported  into  the  cell  where  it  is  transformed 
via  a  metabolic  pathway  to  produce  a  product  that  interacts  with  the  regu¬ 
lator  of  a  transcription  unit,  (d)  The  output  signal  from  one  transcription  unit 
is  the  input  signal  to  another  transcription  unit  within  the  cell. 

tected  by  binding  to  specific  receptor  molecules,  which 
propagate  the  signal  to  specific  regulatory  molecules  in  a 
process  called  transduction,  although  in  many  cases  the  regu¬ 
lator  molecules  are  also  the  receptor  molecules.  Regulator 
molecules  in  turn  bind  to  the  modulator  sites  of  transcription 
units  in  one  of  two  alternative  modes,  and  the  signals  are 
combined  in  a  logic  unit  to  determine  the  rate  of  transcrip¬ 
tion.  On  the  output  side,  transcription  initiates  an  expression 
cascade  that  yields  one  or  many  mRNA  products,  one  or 
many  protein  products,  and  possibly  one  or  many  products  of 
enzymatic  activity.  Thus,  the  transcription  unit  emits  a  fan¬ 
out  of  signals,  which  are  then  connected  in  a  diverse  fashion 
to  the  receptors  of  other  transcription  units  to  complete  the 
interlocking  gene  circuitry. 

B.  input  signaling 

The  input  signals  for  transcription  units  can  arise  either 
from  the  external  environment  or  from  within  the  cell.  When 
signals  originate  in  the  extracellular  environment,  they  often 
involve  binding  of  signal  molecules  to  specific  receptors  in 
the  cellular  membrane  [Fig.  2(a)].  In  bacteria,  alterations  in 
the  membrane-bound  receptor  are  communicated  directly  to 
regulator  proteins  via  short  signal  transduction  pathways 
called  “two-component  systems. 1,2  In  other  cases,  signal 
molecules  in  the  environment  are  transported  across  the 
membrane  [Fig.  2(b)],  and  in  some  cases  are  subsequently 
modified  metabolically  [Fig.  2(c)],  to  become  signal  mol¬ 
ecules  that  bind  directly  to  regulator  proteins  (in  these  cases 
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FIG.  3.  Alternative  modes  of  gene  control.  The  top  panels  illustrate  the 
negative  mode  of  control  in  which  the  bias  for  expression  is  ON  in  the 
absence  of  the  regulator,  and  regulation  is  achieved  by  modulating  the  ef¬ 
fectiveness  of  a  negative  element  The  bottom  panels  illustrate  the  positive 
mode  of  control  in  which  the  bias  for  expression  is  OFF  in  the  absence  of 
the  regulator,  and  regulation  is  achieved  by  modulating  the  effectiveness  of 
a  positive  element  The  solid  arrow  represents  the  mRNA  transcript.  In  each 
case,  induction  by  the  addition  of  a  specific  inducer  causes  the  state  of  the 
system  to  shift  from  the  left  to  the  right,  whereas  repression  by  the  addition 
of  a  specific  co-repressor  causes  the  state  of  the  system  to  shift  from  right  to 
left. 

the  receptor  and  regulator  are  one  and  the  same  molecule). 
When  signals  arise  from  other  transcription  units  within  the 
cell,  the  regulator  can  be  the  direct  output  signal  from  such  a 
transcription  unit  [Fig.  2(d)].  It  can  also  be  the  terminus  of  a 
signal  transduction  pathway  in  which  the  upstream  signal  is 
the  output  from  such  a  transcription  unit.  Thus,  the  input 
signals  for  transcription  units  are  ultimately  the  regulators, 
whether  signals  are  received  from  the  extracellular  or  intra¬ 
cellular  environment.  The  regulators  in  most  cases  are  pro¬ 
tein  molecules,  although  this  function  can  be  preformed  in 
some  cases  by  other  types  of  molecules  such  as  anti-sense 
RNA. 

C.  Mode  of  control 

Regulators  exert  their  control  over  gene  expression  by 
acting  in  one  of  two  different  modes  (Fig.  3). 3  In  the  positive 
mode,  they  stimulate  expression  of  an  otherwise  quiescent 
gene,  and  induction  of  gene  expression  is  achieved  by  sup¬ 
plying  the  functional  form  of  the  regulator.  In  the  negative 
mode,  regulators  block  expression  of  an  otherwise  active 
gene,  and  induction  of  gene  expression  is  achieved  by  re¬ 
moving  the  functional  form  of  the  regulator.  Each  of  these 
two  designs  (positive  or  negative)  requires  the  transcription 
unit  to  have  the  appropriate  modulator  site  (initiator  type  or 
operator  type)  and  promoter  function  (low  level  or  high 
level). 

Variations  in  the  level  of  the  functional  form  of  the  regu¬ 
lator  can  be  achieved  in  different  ways.  Regulator  molecules 
can  have  a  constant  or  constitutive  level  of  expression.  In 
this  case,  the  functional  form  of  the  regulator  is  created  or 
destroyed  by  molecular  alterations  associated  with  the  bind¬ 
ing  of  specific  ligands  (inducers  or  co-regressors).  In  other 
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FIG.  4.  Logic  unit  with  two  inputs.  The  transcription  unit  is  described  in 
Fig.  1,  the  regulator  Rj  interacts  with  the  modulator  site  Mj  via  the  positive 
mode,  the  regulator  R2  interacts  with  the  modulator  sites  M2  via  the  negative 
mode,  and  the  signals  are  combined  by  a  simple  logical  function.  The  logic 
table  is  provided  for  the  logical  AND  and  logical  OR  functions. 

cases,  the  regulator  is  always  in  the  functional  form,  and  its 
level  of  expression  varies  as  the  result  of  changes  in  its  rate 
of  synthesis  or  degradation.  These  different  ways  of  realizing 
variations  in  the  functional  form  of  the  regulator  are  found 
for  both  positive  and  negative  modes  of  control. 

D.  Logic  unit 

The  control  regions  associated  with  transcription  units 
may  be  considered  the  logic  unit  where  input  signals  from 
various  regulators  are  integrated  to  govern  the  rate  of  tran¬ 
scription  initiation.  There  are  two  lines  of  evidence  suggest¬ 
ing  that  most  transcription  units  in  bacteria  have  only  a  few 
regulatory  inputs.  First,  the  early  computational  studies  of 
Stuart  Kauffman  using  abstract  random  Boolean  networks 
suggested  that  two  or  three  inputs  per  transcription  unit  were 
optimal.4  If  the  number  of  inputs  was  fewer  on  average,  the 
behavior  of  the  network  was  too  fixed;  whereas  if  the  num¬ 
ber  was  greater  on  average,  the  behavior  was  too  chaotic. 
The  optimal  behavior  associated  with  a  few  inputs  often  is 
described  as  “operating  at  the  edge  of  chaos.”5  Second, 
with  the  arrival  of  the  genomic  era  and  the  sequencing  of  the 
complete  genome  for  a  number  of  bacteria,  there  is  now 
experimental  evidence  regarding  the  distribution  of  inputs 
per  transcription  unit.  The  sequence  for  Escherichia  coif  has 
shown  that  the  number  of  modulator  sites  located  near  the 
promoters  of  transcription  units  is  on  average  approximately 
two  to  three.7  The  large  majority  have  two  and  a  few  have  as 
many  as  five. 

A  simple  logic  unit  is  illustrated  in  Fig.  4  for  the  case 
with  two  inputs.  This  example  includes  the  classical  lactose 
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{lac)  operon  of  E.  coli ,  which  has  a  positive  and  a  negative 
regulator;  the  AND  function  is  the  logical  operator  by  which 
these  signals  are  combined.8  The  logic  units  of  eukaryotes 
can  be  considerably  more  complex.9 

E.  Expression  cascades 

Expression  cascades  produce  the  output  signals  from 
transcription  units.  They  typically  reflect  the  flow  of  infor¬ 
mation  from  DNA  to  RNA  to  protein  to  metabolites,  which 
has  been  called  the  “Central  Dogma”  of  molecular  biology. 
The  initial  output  of  a  transcription  unit  is  an  mRNA  mol¬ 
ecule  that  has  a  sequence  complementary  to  the  transcribed 
DNA  strand.  The  mRNA  in  tum  is  translated  to  produce  the 
encoded  protein  product.  The  protein  product  in  many  in¬ 
stances  is  an  enzyme,  which  in  turn  catalyzes  a  specific  re¬ 
action  to  produce  a  particular  metabolic  product.  This  in 
skeletal  form  is  the  expression  cascade  that  is  initiated  by 
signals  affecting  a  transcription  unit  (Fig.  5). 

There  are  many  variations  on  this  theme.  There  can  be 
additional  stages  in  such  cascades  and  each  of  the  stages  is  a 
potential  target  for  regulation.  For  example,  the  cascade 
might  include  posttranscriptional  or  posttranslational  stages 
in  which  products  are  processed  before  the  next  stage  in  the 
cascade.  The  cascade  can  also  include  a  stage  in  which  a 
RNA  template  is  used  to  transcribe  a  complementary  DNA 
copy,  as  is  the  case  with  retroviruses  and  retrotransposons. 

There  can  be  multiple  products  produced  at  each  stage  of 
such  cascades.  For  example,  several  different  mRNA  mol¬ 
ecules  can  arise  from  the  same  transcription  unit  by  regula¬ 
tion  of  transcription  termination.  Several  different  proteins 
can  be  synthesized  from  the  same  mRNA  and  this  is  often 
the  case  in  bacteria.  Several  metabolic  products  can  be  pro¬ 
duced  by  a  given  multifunctional  enzyme,  depending  upon 
its  modular  composition.  Thus,  transcription  units  can  be 
considered  to  emit  a  fan  of  output  signals. 

F.  Connectivity 

The  connectivity  of  gene  circuits,  defined  as  the  manner 
in  which  the  outputs  of  transcription  units  are  connected  to 
the  inputs  of  other  transcription  units,  varies  enormously. 
The  evidence  for  E.  coli  suggests  a  fairly  narrow  distribution 
of  input  connections  with  a  mean  of  two  to  three,  whereas 
the  distribution  of  output  connections  has  a  wider  distribu¬ 
tion  with  some  transcription  units  having  as  many  as  50  out¬ 
put  connections.  A  large  number  of  the  connections  involve 
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FIG.  5.  Expression  cascade  that  propagates  signals  in  three  stages  from 
DNA  to  mRNA  to  enzymes  to  small  molecular  weight  signaling  molecules. 
Additional  stages  are  possible,  and  each  stage  can  give  rise  to  multiple 
output  signals. 


FIG.  6.  Connectivity  by  which  expression  cascades  become  coupled.  El¬ 
ementary  circuit  consisting  of  a  regulator  cascade  on  the  left  and  an  effector 
cascade  on  the  right.  The  protein  product  that  is  the  output  of  the  left  cas¬ 
cade  is  a  regulator  of  both  transcription  units,  and  the  metabolic  intermedi¬ 
ate  that  is  an  output  of  the  right  cascade  is  an  inducer  that  modulates  the 
effectiveness  of  the  regulator  at  each  transcription  unit. 


regulator  proteins  modulating  expression  of  the  transcription 
unit  in  which  they  are  encoded,  a  form  of  regulation  termed 
autogenous.10  Another  common  form  of  connection  involves 
the  coupling  of  expression  cascades  for  an  effector  function 
and  for  its  associated  regulator.11  Such  couplings  are  called 
elementary  gene  circuits  and  an  example  is  represented  sche¬ 
matically  in  Fig.  6. 

Connectivity  provides  a  way  of  coordinating  the  expres¬ 
sion  of  related  functions  in  the  cell.12  The  operon,  a  tran¬ 
scription  unit  consisting  of  several  structural  genes  that  are 
transcribed  as  a  single  polycistronic  mRNA,  provides  one 
way  of  coordinating  the  expression  of  several  genes.  Another 
way  is  to  have  each  gene  in  a  separate  transcription  unit  and 
have  all  the  transcription  units  connected  to  the  same  regu¬ 
latory  input  signal.  Such  a  set  of  coordinately  regulated  tran¬ 
scription  units  is  known  as  a  regulon.  Other,  and  more  flex¬ 
ible,  ways  also  exist.  For  example,  when  signals  from  several 
regulators  are  assembled  in  a  combinatorial  fashion  to  gov¬ 
ern  a  collection  of  transcription  units,  each  with  its  own  logic 
unit,  diverse  patterns  of  gene  expression  can  be  orchestrated 
in  response  to  a  variety  of  environmental  contexts. 

III.  METHODS  FOR  COMPARING  DESIGNS  TO 
REVEAL  DESIGN  PRINCIPLES 

Several  different  approaches  have  been  used  to  analyze 
and  compare  gene  circuits,  and  each  has  contributed  in  dif¬ 
ferent  ways  to  our  understanding.  Here  I  need  only  mention 
three  of  the  approaches  that  have  been  dealt  with  in  greater 
detail  elsewhere. 

A.  Types  of  models 

Simplified  models  based  on  random  Boolean  networks 
have  been  used  to  explore  properties  that  are  likely  to  be 
present  with  high  probability  regardless  of  mechanistic  de¬ 
tails  or  evolutionary  history.  These  tend  to  be  discrete/ 
deterministic  models  that  permit  efficient  computational  ex¬ 
ploration  of  large  populations  of  networks,  which  then 
permit  statistical  conclusions  to  be  drawn.  The  work  of 
Kauffman  provides  an  example  of  this  approach.4  The  ele¬ 
ments  of  design  emphasized  in  this  approach  are  the  input 
logic  units  and  the  connectivity,  and  properties  of  the  net¬ 
work  are  examined  as  a  function  of  network  size. 

Detailed  mechanistic  models  have  been  used  to  test  our 
understanding  of  particular  gene  circuits.  The  goal  is  to  rep¬ 
resent  the  detailed  behavior  as  faithfully  as  possible.  A  mix¬ 
ture  of  discrete/continuous/deterministic/stochastic  model  el- 
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ements  might  be  used,  depending  upon  the  particular  circuit. 
These  are  computationally  intensive  and  require  numerical 
values  for  the  parameters,  and  detailed  quantitative  compari¬ 
sons  with  experimental  data  are  important  means  of  valida¬ 
tion.  The  work  of  Arkin  and  colleagues  illustrates  this 
approach.13  The  elements  of  design  emphasized  in  this  ap¬ 
proach  are  all  those  that  manifest  themselves  in  the  particular 
circuit  being  modeled. 

Generic  models  for  specific  classes  of  circuits  have  been 
used  to  identify  design  principles  for  each  class.  The  aim  of 
these  models  is  to  capture  qualitative  features  of  behavior 
that  hold  regardless  of  the  specific  values  for  the  parameters 
and  hence  that  are  applicable  to  the  entire  class  being  char¬ 
acterized.  These  tend  to  be  continuous/deterministic  models 
with  a  regular  formal  structure  that  facilitates  analytical  and 
numerical  comparisons.  Examples  of  this  approach  will  be 
reviewed  below  in  Sec.  IV.  The  elements  of  design  that  tend 
to  be  emphasized  in  this  approach  are  expression  cascades, 
modes  of  control,  input  logic  units,  and  connectivity. 


B.  A  comparative  approach  to  the  study  of  design 

The  elucidation  of  design  principles  for  a  class  of  cir¬ 
cuits  requires  a  formalism  to  represent  alternative  designs, 
methods  of  analysis  capable  of  predicting  behavior,  and 
methods  for  making  well-controlled  comparisons. 

1.  Canonical  nonlinear  representation 

The  power-law  formalism  combines  nonlinear  elements 
having  a  very  specific  structure  (products  of  power  laws) 
with  a  linear  operator  (differentiation)  to  form  a  set  of  ordi¬ 
nary  differential  equations,  which  are  capable  of  representing 
any  suitably  differentiable  nonlinear  function.  This  makes  it 
an  appropriate  formalism  for  representing  alternative  de¬ 
signs. 

The  elements  of  the  power-law  formalism  are  nonlinear 
functions  consisting  of  simple  products  of  power-law  func¬ 
tions  of  the  state  variables14 

vi(X)  =  aiXl8"X28*X38»-  •  -Xn*** .  (1) 

The  two  types  of  parameters  in  this  formalism  are  referred  to 
as  multiplicative  parameters  (af)  and  exponential  param¬ 
eters  ( gjj ).  They  also  are  referred  to  as  rate-constant  param¬ 
eters  and  kinetic-order  parameters ,  since  these  are  accepted 
terms  in  the  context  of  chemical  and  biochemical  kinetics. 
The  multiplicative  parameters  are  non-negative  real,  the  ex¬ 
ponential  parameters  are  real,  and  the  state  variables  are 
positive  real. 

Although  the  nonlinear  behavior  exhibited  by  these  non¬ 
linear  elements  is  fairly  impressive,  it  does  not  represent  the 
full  spectrum  of  nonlinear  behavior  that  is  characteristic  of 
the  power-law  formalism.  When  these  nonlinear  elements  are 
combined  with  the  differential  operator  to  form  a  set  of  or¬ 
dinary  differential  equations  they  are  capable  of  representing 
any  suitably  differentiable  nonlinear  function.  The  two  most 
common  representations  within  the  power-law  formalism  are 
generalized-mass-action  (GMA)  systems 


r  n  +  m  r  n  +  m 

dxi/dt='z  a(k  n  a* n  xi^,  i= i 

A“1  7=1  J  k— 1  >=1  J 

(2) 

and  synergistic  (S)  systems 

n  +  m  n+m 

dXi/dt=aiYl  W-fiiU  *•".  «=  (3) 

7=1  J  7-1 

The  derivatives  of  the  state  variables  with  respect  to  time  t 
are  given  by  dXtldt.  The  a  and  g  parameters  are  defined  as 
in  Eq.  (1)  and  are  used  to  characterize  the  positive  terms  in 
Eqs.  (2)  and  (3),  whereas  the  /?  and  h  parameters  are  simi¬ 
larly  defined  and  are  used  to  characterize  the  negative  terms. 
There  are  in  general  n  dependent  variables,  m  independent 
variables,  and  a  maximum  of  r  terms  of  a  given  sign.  The 
resulting  power-law  formalism  can  be  considered  a  canonical 
nonlinear  representation  from  at  least  three  different  perspec¬ 
tives:  fundamental,  recast,  and  local.15 

As  the  natural  representation  of  the  elements  postulated 
to  be  fundamental  in  a  variety  of  fields,  the  power-law  for¬ 
malism  can  be  considered  a  canonical  nonlinear  representa¬ 
tion.  There  are  a  number  of  representations  that  are  consid¬ 
ered  fundamental  descriptions  of  the  basic  entities  in  various 
fields.  Four  such  representations  that  are  extensively  used  in 
chemistry,  population  biology,  and  physiology  are  mass- 
action,  Volterra-Lotka,  Michaelis-Menten,  and  linear  repre¬ 
sentations.  These  are,  in  fact,  special  cases  of  the  GMA- 
system  representation,15  which,  as  noted  earlier,  is  one  of  the 
two  most  common  representations  within  the  general  frame¬ 
work  of  the  power-law  formalism.  Although,  the  power-law 
formalism  can  be  considered  a  fundamental  representation  of 
chemical  kinetic  events,  this  is  not  the  most  useful  level  of 
representation  for  comparing  gene  circuits  because  it  is  much 
too  detailed  and  values  for  many  of  the  elementary  param¬ 
eters  will  not  be  available.  Nor  does  the  structure  of  the 
GMA  equations  lend  itself  to  general  symbolic  analysis. 

As  a  recast  description,  the  power-law  formalism  can  be 
considered  a  canonical  nonlinear  representation  in  nearly  ev¬ 
ery  case  of  physical  interest.  This  is  because  any  nonlinear 
function  or  set  of  differential  equations  that  is  a  composite  of 
elementary  functions  can  be  transformed  exactly  into  the 
power-law  formalism  through  a  procedure  called  recasting ,16 
This  is  a  well-defined  procedure  for  generating  a  globally 
accurate  representation  that  is  functionally  equivalent  to  the 
original  representation.  In  this  procedure  one  trades  fewer 
equations  with  more  complex  and  varied  forms  of  nonlinear¬ 
ity  for  more  equations  with  simpler  and  more  regular  nonlin¬ 
ear  forms.  Although  the  power-law  formalism  in  the  context 
of  recasting  has  important  uses  and  allows  for  efficient  nu¬ 
merical  solution  of  differential  equations,  this  again  is  not 
the  most  useful  level  of  representation  for  comparing  alter¬ 
native  designs  for  gene  circuits  because  it  does  not  lend  itself 
to  general  systematic  analysis. 

As  a  local  description,  the  power-law  formalism  can  be 
considered  a  canonical  nonlinear  representation  that  is  typi¬ 
cally  accurate  over  a  wider  range  of  variation  than  the  cor¬ 
responding  linear  representation.  The  state  variables  of  a  sys¬ 
tem  can  nearly  always  be  defined  as  positive  quantities. 
Therefore,  functions  of  the  state  variables  can  be  represented 
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equivalently  in  a  logarithmic  space — i.e.,  a  space  in  which 
the  logarithm  of  the  function  is  a  function  of  the  logarithms 
of  the  state  variables.  This  means  that  a  Taylor  series  in 
logarithmic  space  can  also  be  used  as  a  canonical  represen¬ 
tation  of  the  function.  If  the  variables  make  only  small  ex¬ 
cursions  about  their  nominal  operating  values,  then  this  se¬ 
ries  can  be  truncated  at  the  linear  terms,  transformed  back 
into  Cartesian  coordinates,  and  expressed  in  the  power-law 
formalism.  Thus,  Taylor’s  theorem  gives  a  rigorous  justifi¬ 
cation  for  the  local  power-law  formalism  and  specific  error 
bounds  within  which  it  will  provide  an  accurate  representa¬ 
tion. 

A  rigorous  and  systematic  analysis  of  the  second-order 
contributions  to  the  local  power-law  representation  has  been 
developed  by  Salvador.17,18  This  analysis  provides  a  valuable 
approach  for  making  rational  choices  concerning  model  re¬ 
duction.  By  determining  the  second-order  terms  in  the 
power-law  approximation  of  a  more  complex  model  one  can 
determine  those  parts  of  the  model  that  are  accurately  repre¬ 
sented  by  the  first-order  terms.  These  parts  of  the  model  can 
be  safely  represented  by  the  local  representation;  those  parts 
that  would  not  be  represented  with  sufficient  accuracy  can 
then  be  dealt  with  in  a  variety  of  ways,  including  a  more 
fundamental  model  or  a  recast  model,  either  of  which  would 
leave  the  resulting  model  within  the  power-law  representa¬ 
tion. 

The  local  S-system  representation  within  the  power-law 
formalism  has  proved  to  be  more  fruitful  than  the  local 
GMA-system  representation  because  of  its  accuracy  and 
structure.  It  is  typically  more  accurate  because  it  allows  for 
cancellation  of  systematic  errors.19,20  It  has  a  more  desirable 
structure  from  the  standpoint  of  general  symbolic  analysis: 
there  is  an  analytical  condition  for  the  existence  of  a  steady 
state,  an  analytical  solution  for  the  steady  state,  and  an  ana¬ 
lytical  condition  that  is  necessary  for  the  local  stability  of  the 
steady  state.  The  regular  structure  and  tractability  of  the 
S-system  representation  is  an  advantage  in  systematic  ap¬ 
proaches  for  inferring  the  structure  of  gene  networks  from 
global  expression  data.21 

The  S-system  representation,  like  the  linear  and 
Volterra-Lotka  representations,  exhibits  the  same  structure 
at  different  hierarchical  levels  of  organization.22  We  call  this 
the  telescopic  property  of  the  formalism.  Only  a  few  formal¬ 
isms  are  known  to  exhibit  this  property.  A  canonical  formal¬ 
ism  that  provides  a  consistent  representation  across  various 
levels  of  hierarchical  organization  in  space  and  time  has  a 
number  of  advantages.  For  example,  consider  a  system  de¬ 
scribed  by  a  set  of  S-system  equations  with  n  dependent 
variables.  Now  suppose  that  the  variables  of  the  system  form 
a  temporal  hierarchy  such  that  k  of  them  determine  the  tem¬ 
poral  behavior  of  the  system.  The  n  —  k  “fast”  variables  are 
further  assumed  to  approach  a  quasi-steady  state  in  which 
they  are  now  related  to  the  k  temporally  dominant  variables 
by  power-law  equations.  When  these  relationships  are  sub¬ 
stituted  into  the  differential  equations  for  the  temporally 
dominant  variables,  a  new  set  of  differential  equations  with  k 
dependent  variables  is  the  result.  This  reduced  set  is  also  an 
S-system;  that  is,  the  temporally  dominant  subsystem  is  rep¬ 
resented  within  the  same  power-law  formalism.  Thus,  the 


same  methods  of  analysis  can  be  applied  at  each  hierarchical 
level. 

Power-law  expressions  are  found  at  all  hierarchical  lev¬ 
els  of  organization  from  the  molecular  level  of  elementary 
chemical  reactions  to  the  organismal  level  of  growth  and 
allometric  morphogenesis.15  This  recurrence  of  the  power 
law  at  different  levels  of  organization  is  reminiscent  of  frac¬ 
tal  phenomena,  which  exhibit  the  same  behavior  regardless 
of  scale.23  In  the  case  of  fractal  phenomena,  it  has  been 
shown  that  this  self-similar  property  is  intimately  associated 
with  the  power-law  expression.24  Hence,  it  is  not  surprising 
that  the  power-law  formalism  should  provide  a  canonical 
representation  with  telescopic  properties  appropriate  for  the 
characterization  of  complex  nonlinear  systems. 

Finally,  piecewise  power-law  representations  provide  a 
logical  extension  of  the  local  power-law  representation.  The 
piecewise  linear  representation  has  long  been  used  in  the 
temporal  analysis  of  electronic  circuits.25  It  simplifies  the 
analysis,  converting  an  intractable  nonlinear  system  of  equa¬ 
tions  into  a  series  of  simple  linear  systems  of  equations 
whose  behavior,  when  pieced  together,  is  capable  of  closely 
approximating  that  of  the  original  system.  A  different  use  of 
an  analogous  piecewise  representation  was  developed  by 
Bode  to  simplify  the  interpretation  of  complex  rational  func¬ 
tions  that  characterize  the  frequency  response  of  electronic 
circuits.26  This  type  of  Bode  analysis  was  adapted  for  inter¬ 
pretation  of  the  rational  functions  traditionally  used  to  repre¬ 
sent  biochemical  rate  laws27  and  then  developed  more  fully 
into  a  systematic  power-law  formalism  for  the  local  repre¬ 
sentation  of  biochemical  systems  consisting  of  many  enzy¬ 
matic  reactions.15  In  analogy  with  traditional  piecewise  lin¬ 
ear  analysis,  a  piecewise  power-law  representation  has  been 
developed  and  used  to  analyze  models  of  gene  circuitry  (see 
Sec.  IV  C).  This  form  of  representation  greatly  simplifies  the 
analysis;  it  also  captures  the  essential  nonlinear  behavior 
more  directly  and  with  fewer  segments  than  would  a  piece- 
wise  linear  representation. 

2.  Methods  of  analysis 

The  regular,  systematic  structure  of  the  power-law  for¬ 
malism  implies  that  methods  developed  to  solve  efficiently 
equations  having  this  form  will  be  applicable  to  a  wide  class 
of  phenomena.  This  provides  a  powerful  stimulus  to  search 
for  such  methods.  The  potential  of  the  power-law  formalism 
in  this  regard  has  yet  to  be  fully  exploited.  The  following  are 
some  examples  of  generic  methods  that  have  been  developed 
for  analysis  within  the  framework  of  the  power-law  formal¬ 
ism. 

The  simplicity  of  the  local  S-system  representation  has 
led  to  the  most  extensive  development  of  theory,  methodol¬ 
ogy,  and  applications  within  the  power-law  formalism.28  In¬ 
deed,  as  discussed  in  Sec.  Ill  B  1,  the  local  S-system  repre¬ 
sentation  allows  the  derivation  of  important  systemic 
properties  that  would  be  difficult,  if  not  impossible,  to  de¬ 
duce  by  other  means.  These  advances  have  occurred  because 
it  was  recognized  from  the  beginning  that  the  steady- state 
analysis  of  S-systems  reduces  to  conventional  linear  analysis 
in  a  logarithmic  space.  Hence,  one  was  able  to  exploit  the 
powerful  methods  already  developed  for  linear  systems.  For 
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example,  S-systems  have  an  explicit  analytical  solution  for 
the  steady  state.14,27  The  condition  for  the  existence  of  such  a 
steady  state  reduces  to  the  evaluation  of  a  simple  determinant 
involving  the  exponential  parameters  of  the  S-system.  Local 
stability  is  determined  by  two  critical  conditions,  one  involv¬ 
ing  only  the  exponential  parameters  and  the  other  involving 
these  as  well  as  the  multiplicative  parameters.  Steady-state 
(logarithmic)  gain  matrices  provide  a  complete  network 
analysis  of  the  signals  that  propagate  through  the  system. 
Similarly,  steady-state  sensitivity  matrices  provide  a  com¬ 
plete  sensitivity  analysis  of  the  parameters  that  define  the 
system  and  its  robustness.  The  linear  structure  also  permits 
the  use  of  well-developed  optimization  theory  such  as  the 
simplex  method.29 

Analytical  solutions  for  the  local  dynamic  behavior  are 
available,  including  eigenvalue  analysis  for  characterization 
of  the  relaxation  times.30  The  regular  structure  also  allows 
the  conditions  for  Hopf  bifurcation  to  be  expressed  as  a 
simple  formula  involving  the  exponential  parameters.31 
However,  S-systems  are  ultimately  nonlinear  systems  and  so 
there  is  no  analytical  solution  for  dynamic  behavior  outside 
the  range  of  accurate  linear  representation,  which  is  more 
restrictive  than  the  range  of  accurate  power-law  representa¬ 
tion.  Determination  of  the  local  dynamic  behavior  within  this 
larger  range,  and  the  determination  of  global  dynamic  behav¬ 
ior,  requires  numerical  methods. 

An  example  of  what  can  be  done  along  these  lines  is  the 
efficient  algorithm  developed  for  solving  differential  equa¬ 
tions  represented  in  the  canonical  power-law  formalism.32 
This  algorithm,  when  combined  with  recasting,16  can  be  used 
to  obtain  solutions  for  rather  arbitrary  nonlinear  differential 
equations.  More  significantly,  this  canonical  approach  has 
been  shown  to  yield  solutions  in  shorter  time  and  with 
greater  accuracy,  reliability,  and  predictability  than  is  typi¬ 
cally  possible  with  other  methods.  This  algorithm  can  be 
applied  to  other  canonical  formalisms  as  well  as  to  all  rep¬ 
resentations  within  the  power-law  formalism.  This  algorithm 
has  been  implemented  in  a  user-friendly  program  call  PLAS 
(Power-Law  Analysis  and  Simulation),  which  is  available  on 
the  web  (http://correio.cc.fc.ul.pt/~aenf/plas.html). 

Another  example  is  an  algorithm  based  on  the  S- system 
representation  that  finds  multiple  roots  of  nonlinear  algebraic 
equations.33,34  Recasting  allows  one  to  express  rather  general 
nonlinear  equations  in  the  GMA-system  representation 
within  power-law  formalism.  The  steady  states  of  the  GMA- 
system,  which  correspond  to  the  roots  of  the  original  alge¬ 
braic  equation,  cannot  be  obtained  analytically.  However, 
these  power-law  equations  can  be  solved  iteratively  using  a 
local  S-system  representation,  which  amounts  to  a  Newton 
method  in  logarithmic  space.  Each  step  makes  use  of  the 
analytical  solution  that  is  available  with  the  S-system  repre¬ 
sentation  (see  earlier  in  this  work).  The  method  is  robust  and 
converges  rapidly.33  Choosing  initial  conditions  to  be  the 
solution  for  an  S-system  with  terms  selected  in  a  combina¬ 
torial  manner  from  among  the  terms  of  the  larger  GMA- 
system  has  been  shown  to  find  many,  and  in  some  cases  all, 
of  the  roots  for  the  original  equations.34 


3.  Mathematically  controlled  comparison  of 
alternatives 

The  existence  of  an  explicit  solution  allows  for  the  ana¬ 
lytical  specification  of  systemic  constraints  or  invariants  that 
provide  the  basis  for  the  method  of  mathematically  con¬ 
trolled  comparisons.10,11,27,30,35,36  The  method  involves  the 
following  steps.  (1)  The  alternatives  being  compared  are  re¬ 
stricted  to  having  differences  in  a  single  specific  process  that 
remains  embedded  within  its  natural  milieu.  (2)  The  values 
of  the  parameters  that  characterize  the  unaltered  processes  of 
one  system  are  assumed  to  be  strictly  identical  with  those  of 
the  corresponding  parameters  of  the  alternative  system.  This 
equivalence  of  parameter  values  within  the  systems  is  called 
internal  equivalence.  It  provides  a  means  of  nullifying  or 
diminishing  the  influence  of  the  background,  which  in  com¬ 
plex  systems  is  largely  unknown.  (3)  Parameters  associated 
with  the  changed  process  are  initially  free  to  assume  any 
value.  This  allows  the  creation  of  new  degrees  of  freedom. 
(4)  The  extra  degrees  of  freedom  are  then  systematically  re¬ 
duced  by  imposing  constraints  on  the  external  behavior  of 
the  systems,  e.g.,  by  insisting  that  signals  transmitted  from 
input  (independent  variables)  to  output  (dependent  variables) 
be  amplified  by  the  same  factor  in  the  alternative  systems.  In 
this  way  the  two  systems  are  made  as  nearly  equivalent  as 
possible  in  their  interactions  with  the  outside  environment. 
This  is  called  external  equivalence.  (5)  The  constraints  im¬ 
posed  by  external  equivalence  fix  the  values  of  the  altered 
parameters  in  such  a  way  that  arbitrary  differences  in  sys¬ 
temic  behavior  are  eliminated.  Functional  differences  that 
remain  between  alternative  systems  with  maximum  internal 
and  external  equivalence  constitute  irreducible  differences. 
(6)  When  all  degrees  of  freedom  have  been  eliminated,  and 
the  alternatives  are  as  close  to  equivalent  as  they  can  be,  then 
comparisons  are  made  by  rigorous  mathematical  and  com¬ 
puter  analyses  of  the  alternatives. 

Two  key  features  of  this  method  should  be  noted.  First, 
because  much  of  the  analysis  can  be  carried  out  symboli¬ 
cally,  the  results  are  often  independent  of  the  numerical  val¬ 
ues  for  particular  parameters.  This  is  a  marked  advantage 
because  one  does  not  know,  and  in  many  cases  it  would  be 
impractical  to  obtain,  all  the  parameter  values  of  a  complex 
system.  Second,  the  method  allows  one  to  determine  the  rela¬ 
tive  optima  of  alternative  designs  without  actually  having  to 
carry  out  an  optimization  (i.e.,  without  having  to  determine 
explicit  values  for  the  parameters  that  optimize  the  perfor¬ 
mance  of  a  given  design).  If  one  can  show  that  a  given  de¬ 
sign  with  an  arbitrary  set  of  parameter  values  is  always  su¬ 
perior  to  the  alternative  design  that  has  been  made  internally 
and  externally  equivalent,  whether  or  not  the  set  of  param¬ 
eter  values  represents  an  optimum  for  either  design,  then  one 
has  proved  that  the  given  design  will  be  superior  to  the  al¬ 
ternative  even  if  the  alternative  were  assigned  a  parameter 
set  that  optimized  its  performance.  This  feature  is  a  decided 
advantage  because  one  can  avoid  the  difficult  procedure  of 
optimizing  complex  nonlinear  systems. 

The  method  of  mathematically  controlled  comparison 
has  been  used  for  some  time  to  determine  which  of  two 
alternative  regulatory  designs  is  better  according  to  specific 
quantitative  criteria  for  functional  effectiveness.  In  some 
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cases,  as  noted  above,  the  results  obtained  using  this  tech¬ 
nique  are  general  and  qualitatively  clear  cut,  i.e.,  one  design 
is  always  better  than  another,  independent  of  parameter  val¬ 
ues.  For  example,  consider  some  systemic  property,  say  a 
particular  parameter  sensitivity,  whose  magnitude  should  be 
as  small  as  possible.  In  many  cases,  the  ratio  of  this  property 
in  the  alternative  design  relative  to  that  in  the  reference  de¬ 
sign  has  the  form  fl=A/(A  +  £)  where  A  and  B  are  positive 
quantities  with  a  distinct  composition  involving  many  indi¬ 
vidual  parameters.  Such  a  ratio  is  always  less  than  one, 
which  indicates  that  the  alternative  design  is  superior  to  the 
reference  design  with  regard  to  this  desirable  property.  In 
other  cases,  the  results  might  be  general  but  difficult  to  dem¬ 
onstrate  because  the  ratio  has  a  more  complex  form,  and 
comparisons  made  with  specific  values  for  the  parameters 
can  help  to  clarify  the  situation.  In  either  of  these  cases, 
comparisons  made  with  specific  values  for  the  parameters 
also  can  provide  a  quantitative  answer  to  the  question  of  how 
much  better  one  of  the  alternatives  might  be. 

In  contrast  to  the  cases  discussed  previously,  in  which  a 
clear-cut  qualitative  difference  exists,  a  more  ambiguous  re¬ 
sult  is  obtained  when  either  of  the  alternatives  can  be  better, 
depending  on  the  specific  values  of  the  parameters.  For  ex¬ 
ample,  the  ratio  of  some  desirable  systemic  property  in  the 
alternative  design  relative  to  that  in  the  reference  design  has 
the  form  tf  =  (A  +  C)/(A  +  £),  where  A,  B,  and  C  are  posi¬ 
tive  quantities  with  a  distinct  composition  involving  many 
individual  parameters.  For  some  values  of  the  individual  pa¬ 
rameters  C>B  and  for  other  values  C<5,  so  there  is  no 
clear-cut  qualitative  result.  A  numerical  approach  to  this 
problem  has  recently  been  developed  that  combines  the 
method  of  mathematically  controlled  comparison  with  statis¬ 
tical  techniques  to  yield  numerical  results  that  are  general  in 
a  statistical  sense.37  This  approach  retains  some  of  the  gen¬ 
erality  that  makes  mathematically  controlled  comparison  so 
attractive,  and  at  the  same  time  provides  quantitative  results 
that  are  lacking  in  the  qualitative  approach. 

IV.  EXAMPLES  OF  DESIGN  PRINCIPLES  FOR 
ELEMENTARY  GENE  CIRCUITS 

Each  design  feature  of  gene  circuits  allows  for  several 
differences  in  design.  Our  goal  is  to  discover  the  design  prin¬ 
ciples,  if  such  exist,  that  would  allow  one  to  make  predic¬ 
tions  concerning  which  of  the  different  designs  would  be 
selected  under  various  conditions.  For  most  features,  the  de¬ 
sign  principles  are  unknown,  and  we  are  currently  unable  to 
predict  which  design  among  a  variety  of  well-characterized 
designs  might  be  selected  in  a  given  context.  In  a  few  cases, 
as  reviewed  later,  principles  have  been  uncovered.  There  are 
simple  rules  that  predict  whether  the  mode  of  control  will  be 
positive  or  negative,  whether  elementary  circuits  will  be  di¬ 
rectly  coupled,  inversely  coupled,  or  uncoupled,  and  whether 
gene  expression  will  switch  in  a  static  or  dynamic  fashion. 
More  subtle  conditions  relate  the  logic  of  gene  expression  to 
the  context  provided  by  the  life  cycle  of  the  organism. 

A.  Molecular  mode  of  control 

A  simple  demand  theory  based  on  selection  allows  one 
to  predict  the  molecular  mode  of  gene  control.  This  theory 


TABLE  I.  Predicted  correlation  between  molecular  mode  of  control  and  the 
demand  for  gene  expression  in  the  natural  environment. 


Mode  of  control 

Demand  for  expression 

Positive 

Negative 

High 

Selected 

Lost 

Low 

Lost 

Selected 

states  that  the  mode  of  control  is  correlated  with  the  demand 
for  gene  expression  in  the  organism’s  natural  environment: 
positive  when  demand  is  high  and  negative  when  demand  is 
low.  Development  of  this  theory  involved  elucidating  func¬ 
tional  differences,  determining  the  consequences  of  muta¬ 
tional  entropy  (the  tendency  for  random  mutations  to  de¬ 
grade  highly  ordered  structures  rather  than  contribute  to  their 
formation),  and  examining  selection  in  alternative  environ¬ 
ments. 

Detailed  analysis  involving  mathematically  controlled 
comparisons  demonstrates  that  model  gene  circuits  with  the 
alternative  modes  of  control  behave  identically  in  most  re¬ 
spects.  However,  they  respond  in  diametrically  opposed 
ways  to  mutations  in  the  control  elements  themselves.27  Mu¬ 
tational  entropy  leads  to  loss  of  control  in  each  case.  How¬ 
ever,  this  is  manifested  as  super-repressed  expression  in  cir¬ 
cuits  with  the  positive  mode  of  control,  and  constitutive 
expression  in  circuits  with  the  negative  mode.  The  dynamics 
of  mixed  populations  of  organisms  that  harbor  either  the  mu¬ 
tant  or  the  wild-type  control  mechanism  depend  on  whether 
the  demand  for  gene  expression  in  the  environment  is  high  or 
low.38  The  results  are  summarized  in  Table  I.  The  basis  for 
these  results  can  be  understood  in  terms  of  the  following 
qualitative  argument  involving  extreme  environments. 

A  gene  with  a  positive  mode  of  control  and  a  high  de¬ 
mand  for  its  expression  will  be  induced  normally  if  the  con¬ 
trol  mechanism  is  wild  type.  It  will  be  uninduced  if  the  con¬ 
trol  mechanism  is  mutant,  and,  since  expression  cannot  meet 
the  demand  in  this  case,  the  organism  harboring  the  mutant 
mechanism  will  be  selected  against.  In  other  words,  the  func¬ 
tional  positive  mode  of  control  will  be  selected  when  mutant 
and  wild-type  organisms  grow  in  a  mixed  population.  On  the 
other  hand,  in  an  environment  with  a  low  demand  for  expres¬ 
sion,  the  gene  will  be  uninduced  in  both  wild-type  and  mu¬ 
tant  organisms  and  there  will  be  no  selection.  Instead,  the 
mutants  will  accumulate  with  time  because  of  mutational 
entropy,  and  the  wild-type  organisms  with  the  functional 
positive  mode  of  control  will  be  lost. 

The  results  for  the  negative  mode  of  control  are  just  the 
reverse.  A  gene  with  a  negative  mode  of  control  and  a  low 
demand  for  its  expression  will  be  uninduced  normally  if  the 
control  mechanism  is  wild  type.  It  will  be  constitutively  in¬ 
duced  if  the  control  mechanism  is  mutant,  and,  since  inap¬ 
propriate  expression  in  time  or  space  tends  to  be  dysfunc¬ 
tional,  the  organism  harboring  the  mutant  mechanism  will  be 
selected  against.  In  other  words,  the  functional  negative 
mode  of  control  will  be  selected  when  mutant  and  wild-type 
organisms  grow  in  a  mixed  population.  On  the  other  hand,  in 
an  environment  with  a  high  demand  for  expression,  the  gene 
will  be  induced  in  both  wild-type  and  mutant  organisms  and 
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TABLE  II.  General  predictions  regarding  the  mode  of  control  for  regulation 
of  cell-specific  functions  in  differentiated  cell  types.* 


Cell-specific  functions 

Cell  type 

A 

B 

A 

Positive 

Negative 

B 

Negative 

Positive 

aSee  Fig.  7  and  discussion  in  the  text  for  a  specific  example. 
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there  will  be  no  selection.  Instead,  the  mutants  will  accumu¬ 
late  with  time  because  of  mutational  entropy,  and  the  wild- 
type  organisms  with  the  functional  negative  mode  of  control 
will  be  lost. 

The  predictions  of  demand  theory  are  in  agreement  with 
nearly  all  individual  examples  for  which  both  the  mode  of 
control  and  the  demand  for  expression  are 
well-documented.39  On  the  basis  of  this  strong  correlation, 
one  can  make  predictions  concerning  the  mode  of  control 
when  the  natural  demand  for  expression  is  known,  or  vice 
versa.  Moreover,  when  knowledge  of  cellular  physiology 
dictates  that  pairs  of  regulated  genes  should  be  subject  to  the 
same  demand  regime,  even  if  it  is  unknown  whether  the 
demand  in  the  natural  environment  is  high  or  low,  then  de¬ 
mand  theory  allows  one  to  predict  that  the  mode  of  control 
will  be  of  the  same  type  for  both  genes.  Conversely,  when 
such  genes  should  be  subject  to  opposite  demand  regimes, 
and  again  even  if  it  is  unknown  whether  the  demand  in  the 
natural  environment  is  high  or  low,  then  demand  theory  al¬ 
lows  one  to  predict  that  the  mode  of  control  will  be  of  the 
opposite  type  for  these  genes.  The  value  of  such  predictions 
is  that  once  the  mode  of  control  is  determined  experimentally 
for  one  of  the  two  genes,  one  can  immediately  predict  the 
mode  of  control  for  the  other. 

Straightforward  application  of  demand  theory  to  the  con¬ 
trol  of  cell-specific  functions  in  differentiated  cell  types  not 
only  makes  predictions  about  the  mode  of  control  for  these 
functions  in  each  of  the  cell  types,  but  also  makes  the  sur¬ 
prising  prediction  that  the  mode  of  control  itself  ought  to 
undergo  switching  during  differentiation  from  one  cell  type 
to  another.40  Table  II  summarizes  the  general  predictions, 
and  Fig.  7  provides  a  specific  example  of  a  simple  model 
system,  cells  of  Escherichia  coli  infected  with  the  temperate 
bacteriophage  X,  that  fulfills  these  predictions.  During  lytic 
growth  (cell  type  A  in  Table  II),  the  lytic  functions  (A- 
specific  functions)  are  in  high  demand  and  are  predicted  to 
involve  the  positive  mode  of  control.  Indeed,  they  are  con¬ 
trolled  by  the  N  gene  product,  which  is  an  anti-terminator 
exercising  a  positive  mode  of  control.  At  the  same  time,  the 
lysogenic  functions  (B-specific  functions)  are  in  low  demand 
and  are  predicted  to  involve  the  negative  mode  of  control.  In 
this  case,  they  are  controlled  by  the  CRO  gene  product, 
which  is  a  repressor  exercising  a  negative  mode  of  control. 
Conversely,  during  lysogenic  growth  (cell  type  B  in  Table 
II),  the  lytic  functions  (A-specific  functions)  are  in  low  de¬ 
mand  and  are  predicted  to  involve  the  negative  mode  of  con¬ 
trol.  Indeed,  they  are  controlled  by  the  Cl  gene  product, 
which  is  a  repressor  exercising  a  negative  mode  of  control. 
At  the  same  time,  the  lysogenic  functions  (B-specific  func- 
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FIG.  7.  Switching  the  mode  of  control  for  regulated  cell-specific  functions 
during  differentiation.  The  temperate  bacteriophage  X.  can  be  considered  a 
simple  model  system  that  exhibits  two  differentiated  forms:  (Top  panel)  The 
lytic  form  in  which  the  phage  infects  a  cell,  multiplies  to  produce  multiple 
phage  copies,  lysis  the  cell,  and  the  released  progeny  begin  another  cycle  of 
lytic  growth.  (Bottom  panel)  The  lysogenic  form  in  which  the  phage  ge¬ 
nome  is  stably  incorporated  into  the  host  cell  DNA  and  is  replicated  pas¬ 
sively  once  each  time  the  host  genome  is  duplicated.  During  differentiation, 
when  the  lysogenic  phage  is  induced  to  become  a  lytic  phage  or  the  lytic 
phage  becomes  a  lysogenic  phage  upon  infection  of  a  bacterial  cell,  the 
mode  of  control  switches  from  positive  to  negative  or  vice  versa  because  of 
the  interlocking  gene  circuitry  of  phage  X.  See  text  for  further  discussion. 


tions)  are  in  high  demand  and  are  predicted  to  involve  the 
positive  mode  of  control.  In  this  case,  they  are  controlled  by 
the  Cl  gene  product,  which  is  also  an  activator  exercising  a 
positive  mode  of  control.  The  mode  in  each  individual  case 
is  predicted  correctly,  and  the  switching  of  modes  during 
“differentiation”  (from  lysogenic  to  lytic  growth  or  vice 
versa)  is  brought  about  by  the  interlocking  circuitry  of 
phage  X. 

B.  Coupling  of  elementary  gene  circuits 

There  are  logically  just  three  patterns  of  coupling  be¬ 
tween  the  expression  cascades  for  regulator  and  effector  pro¬ 
teins  in  elementary  gene  circuits.  These  are  the  directly 
coupled,  uncoupled,  and  inversely  coupled  patterns  in  which 
regulator  gene  expression  increases,  remains  unchanged,  or 
decreases  with  an  increase  in  effector  gene  expression  (Fig. 
8).  Elementary  gene  circuits  in  bacteria  have  long  been  stud¬ 
ied  and  there  are  well-characterized  examples  that  exhibit 
each  of  these  patterns. 

A  design  principle  governing  the  pattern  of  coupling  in 
such  circuits  has  been  identified  by  mathematically  con¬ 
trolled  comparison  of  various  designs.11  The  principle  is  ex¬ 
pressed  in  terms  on  two  properties:  the  mode  of  control 
(positive  or  negative)  and  the  capacity  for  regulated  expres¬ 
sion  (large  or  small  ratio  of  maximal  to  basal  level  of  expres¬ 
sion).  According  to  this  principle,  one  predicts  that  elemen¬ 
tary  gene  circuits  with  the  negative  mode  and  small, 
intermediate,  and  large  capacities  for  gene  regulation  will 
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FIG.  8.  Coupling  of  expression  in  elementary  gene  circuits.  The  panel  on 
the  right  shows  the  steady-state  expression  characteristic  for  the  effector 
cascade  in  Fig.  6.  The  panel  on  the  left  shows  the  steady-state  expression 
characteristic  for  the  regulator  cascade.  Induction  of  effector  expression  oc¬ 
curs  while  regulator  expression  increases  (directly  coupled  expression),  re¬ 
mains  unchanged  (uncoupled  expression),  or  decreases  (inversely  coupled 
expression). 

exhibit  direct  coupling,  uncoupling,  and  inverse  coupling, 
respectively.  Circuits  with  the  positive  mode,  in  contrast,  are 
predicted  to  have  inverse  coupling,  uncoupling,  and  direct 
coupling. 

The  approach  used  to  identify  this  design  principle  in¬ 
volves  (1)  formulating  kinetic  models  that  are  sufficiently 
generic  to  include  all  of  the  logical  possibilities  for  coupling 
of  expression  in  elementary  gene  circuits,  (2)  making  these 
models  equivalent  in  all  respects  other  than  their  regulatory 
parameters,  (3)  identifying  a  set  of  a  priori  criteria  for  func¬ 
tional  effectiveness  of  such  circuits,  (4)  analyzing  the  steady- 
state  and  dynamic  behavior  of  the  various  designs,  and  (5) 
comparing  the  results  to  determine  which  designs  are  better 
according  to  the  criteria.  These  steps  are  outlined  next. 

The  kinetic  models  are  all  special  cases  of  the  generic 
model  that  is  graphically  depicted  in  Fig.  6.  This  model, 
which  captures  the  essential  features  of  many  actual  circuits, 
includes  two  transcription  units:  one  for  a  regulator  gene  and 
another  for  a  set  of  effector  genes.  The  regulator  gene  en¬ 
codes  a  protein  that  acts  at  the  level  of  transcription  to  bring 
about  induction,  and  the  effector  genes  encode  the  enzymes 
that  catalyzes  a  pathway  of  reactions  in  which  the  inducer  is 
an  intermediate.  The  regulator  can  negatively  or  positively 
influence  transcription  at  the  promoter  of  each  transcription 
unit,  and  these  influences,  whether  negative  or  positive,  can 
be  facilitated  or  antagonized  by  the  inducer.  A  local  power- 
law  representation  that  describes  the  regulatable  region  (i.e., 
the  inclined  portion)  of  the  steady-state  expression  character¬ 
istics  in  Fig.  8  is  the  following; 


dXx  /dt  =  aiX3g'3X5*'sX6s'(‘-/3lXih", 

(4) 

dX2ldt=a2X^'X1gll-p2X^11, 

(5) 

dX 3  Idt  =  a3X/MV38  -  /?3X2"%"« 

(6) 

dX 4  /dt=a4X3SiiX5S45X6S46-  y34A'4/’44, 

(7) 

dX5/dt  =  a5X/s%s”-{35X5h”. 

(8) 

There  are  four  parameters  that  characterize  the  pattern  of 
regulatory  interactions:  g13  and  g43  quantify  influences  of 
inducer  X3  on  the  rate  of  synthesis  of  effector  mRNA  Xt  and 
regulator  mRNA  X4,  whereas  g]5  and  g45  quantify  influ¬ 
ences  of  regulator  X5  on  these  same  processes. 


TABLE  III.  Predicted  patterns  of  coupling  for  regulator  and  effector  cas¬ 
cades  in  elementary  gene  circuits. 


Mode  of  control 

Capacity  for  regulation* 

Pattern  of  coupling 

Positive 

Large 

Directly  coupled 

Positive 

Intermediate 

Uncoupled 

Positive 

Small 

Inversely  coupled 

Negative 

Large 

Inversely  coupled 

Negative 

Intermediate 

Uncoupled 

Negative 

Small 

Directly  coupled 

*Capacity  for  regulation  is  defined  as  the  ratio  of  maximal  to  basal  level  of 
expression. 


In  the  various  models,  the  values  for  all  corresponding 
parameters  other  than  the  four  regulatory  parameters  are 
made  equal  (internal  equivalence).  The  four  regulatory  pa¬ 
rameters  have  their  values  constrained  so  as  to  produce  the 
same  steady-state  expression  characteristics  (external  equiva¬ 
lence).  Models  exhibiting  each  of  the  three  patterns  of  cou¬ 
pling  are  represented  within  the  space  of  the  constrained 
regulatory  parameters. 

Six  quantitative,  a  priori  criteria  for  functional  effective¬ 
ness  are  used  as  a  basis  for  comparing  the  behavior  of  the 
various  models.  These  are  decisiveness,  efficiency,  selectiv¬ 
ity,  stability,  robustness,  and  responsiveness.  A  decisive  sys¬ 
tem  has  a  sharp  threshold  for  response  to  substrate.  An  effi¬ 
cient  system  makes  a  large  amount  of  product  from  a  given 
supra-threshold  increment  in  substrate.  A  selective  system 
governs  the  amount  of  regulator  so  as  to  ensure  specific  con¬ 
trol  of  effector  gene  expression.  A  locally  stable  system  re¬ 
turns  to  its  original  state  following  a  small  perturbation.  A 
robust  system  tends  to  maintain  its  state  despite  changes  in 
parameter  values  that  determine  its  structure.  A  responsive 
system  quickly  adjusts  to  changes.  (Further  discussion  of 
these  criteria  and  the  means  by  which  they  are  quantified  can 
be  found  elsewhere.11) 

The  steady-state  and  dynamic  behavior  of  the  various 
models  is  analyzed  by  standard  algebraic  and  numerical 
methods,  and  the  results  are  quantified  according  to  the 
above  criteria.  Temporal  responsiveness  is  a  distinguishing 
criterion  for  effectiveness  of  these  circuits.  A  comparison  of 
results  for  models  with  the  various  patterns  of  coupling  leads 
to  the  predicted  correlations  summarized  in  Table  III. 

To  test  these  predicted  correlations  we  identified  32  el¬ 
ementary  gene  circuits  for  which  the  mode  of  control  was 
known  and  for  which  quantitative  data  regarding  the  capaci¬ 
ties  for  regulator  and  effector  gene  expression  were  available 
in  the  literature.  A  plot  of  these  data  in  Fig.  9  shows  reason¬ 
able  agreement  with  the  predicted  positive  slope  for  the 
points  representing  circuits  with  a  positive  mode  and  the 
predicted  negative  slope  for  the  points  representing  circuits 
with  a  negative  mode.  Global  experiments  that  utilize  mi¬ 
croarray  technology  could  provide  more  numerous  and  po¬ 
tentially  more  accurate  tests  of  these  predictions. 

C.  Connectivity  and  switching 

Gene  expression  can  be  switched  ON  (and  OFF)  in  ei¬ 
ther  a  discontinuous  dynamic  fashion  or  a  continuously  vari¬ 
able  static  fashion  in  response  to  developmental  or  environ- 
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Log  (Effector  gene  expression) 

FIG.  9.  Experimental  data  for  the  coupling  of  expression  in  elementary  gene 
circuits.  The  capacity  for  induction  of  the  effector  cascade  is  plotted  on  the 
horizontal  axis  as  positive  values.  The  capacity  for  expression  of  the  regu¬ 
lator  cascade  is  plotted  on  the  vertical  axis  as  positive  values  (induction), 
negative  values  (repression),  or  zero  (no  change  in  expression).  Effector 
cascades  having  a  positive  mode  of  control  are  represented  as  data  points 
with  filled  symbols  and  those  having  a  negative  mode  with  open  symbols. 
Data  show  reasonably  good  agreement  with  the  predictions  in  Table  III. 

mental  cues.  These  alternative  switch  behaviors  are  clearly 
manifested  in  the  steady-state  expression  characteristic  of  the 
gene.  In  some  cases,  the  elements  of  the  circuitry  appear  to 
be  the  same,  and  yet  the  alternative  behaviors  can  be  gener¬ 


ated  by  the  way  in  which  the  elements  are  interconnected. 
This  design  feature  has  been  examined  in  simple  model  cir¬ 
cuits.  The  results  have  led  to  specific  conditions  that  allow 
one  to  distinguish  between  these  alternatives,  and  these  con¬ 
ditions  can  be  used  to  interpret  the  results  of  experiments 
with  the  lac  operon  of  E '  coll 

A  design  principle  that  distinguishes  between  discon¬ 
tinuous  and  continuous  switches  in  a  model  for  inducible 
catabolic  pathways  (Fig.  10)  is  the  following.  If  the  natural 
inducer  is  the  initial  substrate  of  the  inducible  pathway,  or  if 
it  is  an  intermediate  in  the  inducible  pathway,  then  the  switch 
will  be  continuous;  if  the  inducer  is  the  final  product  of  the 
inducible  pathway,  then  the  switch  can  be  discontinuous  or 
continuous,  depending  on  an  algebraic  condition  that  in¬ 
volves  four  kinetic  orders  for  reactions  in  the  circuit.  (A 
more  general  statement  of  the  principle  can  be  given  in  terms 
of  the  algebraic  condition,  as  will  be  shown  below.) 

A  simplified  set  of  equations  that  captures  the  essential 
features  of  the  model  in  Fig.  10  is  the  following: 


dX  i 

Idt  —  aXB  P\X\ , 

X3<X3/., 

(9a) 

dX} 

ldt=  axX38li— /3xX 

l>  ^3L<^3<^3H> 

(9b) 

dXx 

Idt—  alM  —  p\Xx , 

(9c) 

lnX3 


In  X4 


FIG.  10.  Simplified  model  of  an  inducible  catabolic  pathway  exhibits  two  types  of  switch  behavior  depending  upon  the  position  of  the  inducer  in  the  inducible 
pathway,  (a)  The  inducer  is  the  final  product  of  the  inducible  pathway,  (b)  The  S-shaped  curve  is  the  steady-state  solution  for  Eqs.  (9)  and  (10).  The  lines  (a, 
b,  and  c )  are  the  steady-state  solutions  for  Eq.  (1 1)  with  different  fixed  concentrations  of  the  stimulus  X4 .  The  steady-state  solutions  for  the  system  are  given 
by  the  intersections  of  the  S-shaped  curve  and  the  straight  lines.  There  is  only  one  intersection  (maximal  expression)  when  In  X4>a ;  there  is  only  one 
intersection  (basal  expression)  when  In  X4<b .  There  are  three  intersections  when  b< In  X4  =c<at  but  the  middle  one  is  unstable.  The  necessary  and  sufficient 
condition  for  the  bistable  behavior  in  this  context  is  that  the  slope  of  the  straight  line  be  less  than  the  slope  of  the  S-shaped  curve  at  intermediate  concentrations 
of  the  inducer  X3 ,  which  is  the  condition  expressed  in  Eq.  (12).  (c)  The  steady-state  induction  characteristic  exhibits  discontinuous  dynamic  switches  and  a 
well-defined  hysteresis  loop.  Thus,  at  intermediate  concentrations  of  the  stimulus  XA ,  expression  will  be  at  either  the  maximal  or  the  basal  level  depending 
upon  the  past  history  of  induction,  (d)  The  inducer  is  an  intermediate  in  the  inducible  pathway,  (e)  The  steady-state  solutions  for  the  system  are  given  by  the 
intersections  of  the  S-shaped  curve  and  vertical  lines.  There  is  only  one  intersection  possible  for  any  given  concentration  of  stimulus.  (0  The  steady-state 
induction  characteristic  exhibits  a  continuously  changing  static  switch,  (g)  The  inducer  is  the  initial  substrate  of  the  inducible  pathway,  (h)  The  steady-state 
solutions  for  the  system  are  given  by  the  intersections  of  the  S-shaped  curve  and  the  lines  of  negative  slope.  There  is  only  one  intersection  possible  for  any 
given  concentration  of  stimulus,  (i)  The  steady-state  induction  characteristic  exhibits  a  continuously  changing  static  switch. 
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TABLE  IV.  Summary  of  predictions  relating  type  of  switch  behavior  to  the  connectivity  in  the  model  inducible 
circuit  of  Fig.  10. 


Figure 

Stimulus 

Inducer 

Transport 

Connection  from 
inducible  pathway 

Switch 

10(d) 

10(a) 

10(d) 

10(g) 

IPTG 

IPTG 

Lactose 

Allolactose 

IPTG 

IPTG 

Allolactose 

Allolactose 

Constitutive 

Inducible 

Inducible 

Constitutive 

None 

Product 

Intermediate 

Substrate 

Static 

Dynamic 

Static 

Static 

dX 2ldt  —  &2X  1  @2^2  >  (10) 

dX-i  Idt  =  a3Xg*Xg*-  fcX^X1'33.  (11) 

The  variables  Xx ,  X2>  X3i  and  X4  represent  the  concentra¬ 
tions  of  polycistronic  mRNA,  a  coordinately  regulated  set  of 
proteins,  inducer,  and  stimulus,  respectively.  This  is  a  piece- 
wise  power-law  representation  (see  Appendix  of  Ref.  27) 
that  emphasizes  distinct  regions  of  operation.  There  is  a  con¬ 
stant  basal  level  of  expression  when  inducer  concentration 
X3  is  lower  than  a  value  X3L\  there  is  a  constant  maximal 
level  of  expression  when  inducer  concentration  is  higher 
than  a  value  X3H\  there  is  a  regulated  level  of  expression 
(with  cooperativity  indicated  by  a  value  of  the  parameter 
g13>l)  when  inducer  concentration  is  between  the  values 
X3L  and  X3H.  All  parameters  in  this  model  have  positive 
values. 

The  position  of  the  natural  inducer  in  an  inducible  path¬ 
way  has  long  been  known  to  have  a  profound  effect  on  the 
local  stability  of  the  steady  state  when  the  system  is  operat¬ 
ing  on  the  inclined  portion  (i.e.,  the  regulatable  region)  of  the 
steady-state  expression  characteristic  (Fig.  8,  right  panel).27 
As  the  position  of  the  natural  inducer  is  changed  from  the 
initial  substrate  [Fig.  10(g)],  to  an  intermediate  [Fig.  10(d)] 
to  the  final  product  [Fig.  10(a)]  of  the  inducible  pathway  (all 
other  parameters  having  fixed  values),  the  margin  of  stability 
decreases.  In  this  progression  the  single  stable  steady  state 
[Fig.  10(h)]  can  undergo  a  bifurcation  to  an  unstable  steady 
state  flanked  by  two  stable  steady  states  [Fig.  10(b)],  which 
is  the  well-known  cusp  catastrophe  characteristic  of  a  dy¬ 
namic  ON-OFF  switch.41 

The  critical  conditions  for  the  existence  of  multiple 
steady  states  and  a  dynamic  switch  are  given  by 

£13>*33/(S32“^32)  and  g  32>  A  32  •  (12) 

In  general,  the  inducible  proteins  must  have  a  greater  influ¬ 
ence  on  the  synthesis  (g32)  than  on  the  degradation  ( h32 )  of 
the  inducer.  These  conditions  can  be  interpreted,  according 
to  conventional  assumptions,  in  terms  of  inducer  position  in 
the  pathway.  If  the  position  of  the  true  inducer  is  functionally 
equivalent  to  that  of  the  substrate  for  the  inducible  pathway, 
then  g32=0  and  the  conditions  in  Eq.  (12)  cannot  be  satis¬ 
fied.  If  the  position  is  functionally  equivalent  to  that  of  the 
intermediate  in  the  inducible  pathway,  the  kinetic  orders  for 
the  rates  of  synthesis  and  degradation  of  the  intermediate  are 
the  same  with  respect  to  the  enzymes  for  synthesis  and  deg¬ 
radation,  and  these  enzymes  are  coordinately  induced,  then 
£32=^32  and  again  the  conditions  in  Eq.  (12)  cannot  be  sat¬ 
isfied.  However,  if  the  position  is  functionally  equivalent  to 


that  of  the  product  for  the  inducible  pathway,  then  632=0 
and  the  conditions  in  Eq.  (12)  can  be  satisfied  provided  gj3 
>h33^832' 

The  values  of  the  parameters  in  this  model  have  been 
estimated  from  experimental  data  for  the  lac  operon  of  E. 
co/l10  These  results,  together  with  these  data,  can  be  used  to 
interpret  four  experiments  involving  the  circuitry  of  the  lac 
operon  (see  Table  IV  and  the  following  discussion). 

First,  if  the  lac  operon  is  induced  with  the  nonmetabo- 
lizable  (gratuitous)  inducer  isopropyl-/?,  D-thiogalactoside 
(IPTG)  in  a  cell  with  the  inducible  Lac  permease  protein, 
then  the  model  is  as  shown  in  Fig.  10(a).  In  this  case,  Xx  is 
the  concentration  of  polycistronic  lac  mRNA,  X2  is  the  con¬ 
centration  of  the  Lac  permease  protein  alone  ( X2  has  no 
influence  on  the  degradation  of  the  inducer  X3 ),  X3  is  the 
intracellular  concentration  of  IPTG,  and  X4  is  the  extracellu¬ 
lar  concentration  of  IPTG.  With  the  parameter  values  from 
the  lac  operon,  the  conditions  in  Eq.  (12)  are  satisfied  be¬ 
cause  633=  1  (aggregate  loss  by  all  causes  in  exponentially 
growing  cells  is  first  order),  g32=  1  (enzymatic  rate  is  first 
order  with  respect  to  the  concentration  of  total  enzyme),  and 
8 13 = 2  (the  Hill  coefficient  of  lac  transcription  with  respect 
to  the  concentration  of  inducer  is  second  order). 

Second,  if  the  lac  operon  is  induced  with  the  gratuitous 
inducer  IPTG  in  a  cell  without  the  Lac  permease  protein, 
then  the  inducer  IPTG  is  not  acted  upon  by  any  of  the  protein 
products  of  the  operon.  In  this  case,  Xx  is  the  concentration 
of  polycistronic  lac  mRNA,  X2  is  the  concentration  of 
/?-galactosidase  protein  alone  ( X2  has  no  influence  on  either 
the  synthesis  or  the  degradation  of  the  inducer  X3),  X3  is  the 
intracellular  concentration  of  IPTG,  and  X4  is  the  extracellu¬ 
lar  concentration  of  IPTG.  The  conditions  in  Eq.  (12)  now 
cannot  be  satisfied  because  £23=623=0  and  all  other  param¬ 
eters  are  positive.  This  is  an  open-loop  situation  in  which 
expression  of  the  operon  is  simply  proportional  to  the  rate  of 
transcription  as  determined  by  the  steady-state  concentration 
of  intracellular  IPTG,  which  is  proportional  to  the  concentra¬ 
tion  of  extracellular  IPTG. 

Thus,  the  kinetic  model  accounts  for  two  important  ob¬ 
servations  from  previous  experiments.  It  accounts  for  the 
classic  experimental  results  of  Novick  and  Weiner42  in  which 
they  observed  a  discontinuous  dynamic  switch  with  hyster¬ 
esis.  They  induced  the  lac  operon  with  a  gratuitous  inducer 
that  was  transported  into  the  cell  by  the  inducible  Lac  per¬ 
mease,  was  diluted  by  cellular  growth,  but  was  not  acted 
upon  by  the  remainder  of  the  inducible  pathway.  Hence,  the 
gratuitous  inducer  occupied  the  position  of  final  product  for 
the  inducible  pathway  (in  this  case  simply  the  Lac  permease 
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step),  and  the  model  predicts  dynamic  bistable  switch  behav¬ 
ior  similar  to  that  depicted  in  Figs.  10(b)  and  10(c).  The 
kinetic  model  also  accounts  for  the  classic  experimental  re¬ 
sults  of  Sadler  and  Novick43  in  which  they  observed  a  con¬ 
tinuous  static  switch  without  hysteresis.  In  their  experiments 
they  used  a  mutant  strain  of  E.  coli  in  which  the  lac  per¬ 
mease  was  inactivated  and  they  induced  the  lac  operon  with 
a  gratuitous  inducer.  In  this  system,  the  inducer  is  not  acted 
upon  by  any  part  of  the  inducible  pathway,  the  extracellular 
and  intracellular  concentrations  of  inducer  are  proportional, 
and  the  model  predicts  a  continuous  static  switch  similar  to 
that  depicted  in  Figs.  10(e)  and  10(f).  The  model  in  Fig.  10 
also  makes  two  other  predictions  related  to  the  position  of 
the  natural  inducer  in  the  inducible  pathway. 

First,  if  the  lac  operon  is  induced  with  lactose  in  a  cell 
with  all  the  inducible  Lac  proteins  intact,  then  the  model  is 
as  shown  in  Fig.  10(d).  In  this  case,  X{  is  the  concentration 
of  polycistronic  lac  mRNA,  X2  is  the  concentration  of  the 
Lac  permease  protein  as  well  as  the  concentration  of  the 
/3-gaIactosidase  protein  (which  catalyzes  both  the  conversion 
of  lactose  to  allolactose  and  the  conversion  of  allolactose  to 
galactose  and  glucose),  X3  is  the  intracellular  concentration 
of  allolactose,  and  X4  is  the  extracellular  concentration  of 
lactose.  In  steady  state,  the  sequential  conversion  of  extracel¬ 
lular  lactose  to  intracellular  lactose  (by  Lac  permease)  and 
intracellular  lactose  to  allolactose  (by  yS-galactosidase)  can 
be  represented  without  loss  of  generality  as  a  single  process 
because  these  two  proteins  are  coordinately  expressed. 
Again,  the  conditions  in  Eq,  (12)  cannot  be  satisfied.  In  this 
case,  £23— ^23”  1  aU  other  parameters  are  positive  finite, 
and  the  model  predicts  a  continuous  static  switch  similar  to 
that  depicted  in  Figs.  10(e)  and  10(f). 

Second,,  if  the  lac  operon  is  induced  with  allolactose,  the 
natural  inducer,  in  a  cell  without  the  Lac  permease  protein, 
then  the  model  is  as  shown  in  Fig.  10(g).  In  this  case,  X}  is 
the  concentration  of  polycistronic  lac  mRNA,  X2  is  the  con¬ 
centration  of  the  /3-galactosidase  protein  alone  (which  cata¬ 
lyzes  the  conversion  of  allolactose  to  galactose  and  glucose), 
X3  is  the  intracellular  concentration  of  allolactose,  and  X4  is 
the  extracellular  concentration  of  allolactose.  The  conditions 
in  Eq.  (12)  cannot  be  satisfied.  In  this  case,  £23“  0  a°d  all 
other  parameters  are  positive,  and  the  model  predicts  a  con¬ 
tinuous  static  switch  similar  to  that  represented  in  Figs.  10(h) 
and  10(i). 

The  fact  that  the  kinetic  model  of  the  lac  operon  predicts 
a  continuous  static  switch  in  response  to  extracellular  lactose 
led  us  to  search  the  literature  for  the  relevant  experimental 
data.  We  were  unable  to  find  any  experimental  evidence  for 
either  a  continuous  static  switch  or  a  discontinuous  dynamic 
switch  in  response  to  lactose,  which  comes  as  a  surprise. 
Despite  the  long  history  of  study  involving  the  lac  operon, 
such  experiments  apparently  have  not  been  reported.  Experi¬ 
ments  to  test  this  prediction  specifically  are  currently  being 
designed  and  carried  out  (Atkinson  and  Ninfa,  unpublished 
results). 

D.  Context  and  logic 

In  the  qualitative  version  of  demand  theory  (Sec.  IV  A)  it 
was  assumed  for  simplicity  that  there  was  a  constant  demand 


regime  for  the  effector  gene  in  question  and  that  its  expres¬ 
sion  was  controlled  by  a  single  regulator.  Here  I  review  the 
quantitative  version  of  demand  theory  and  include  consider¬ 
ation  of  genes  exposed  to  more  than  one  demand  regime  and 
controlled  by  more  than  one  regulator. 

7.  Life  cycle  provides  the  context  for  gene  control 

Models  that  include  consideration  of  the  organism’s  life 
cycle,  molecular  mode  of  gene  control,  and  population  dy¬ 
namics  are  used  to  describe  mutant  and  wild-type  popula¬ 
tions  in  two  environments  with  different  demands  for  expres¬ 
sion  of  the  genes  in  question.  These  models  are  analyzed 
mathematically  in  order  to  identify  conditions  that  lead  to 
either  selection  or  loss  of  a  given  mode  of  control.  It  will  be 
shown  that  this  theory  ties  together  a  number  of  important 
variables,  including  growth  rates,  mutation  rates,  minimum 
and  maximum  demands  for  gene  expression,  and  minimum 
and  maximum  durations  for  the  life  cycle  of  the  organism.  A 
test  of  the  theory  is  provided  by  the  lac  operon  of  E.  coli. 

The  life  cycle  of  E.  coli  involves  sequential  colonization 
of  new  host  organisms,44  which  means  repeated  cycling  be¬ 
tween  two  different  environments  [Figs.  11(a)  and  11(b)].  In 
one,  the  upper  portion  of  the  host’s  intestinal  track,  the  mi¬ 
crobe  is  exposed  to  the  substrate  lactose  and  there  is  a  high 
demand  for  expression  of  the  lac  operon,  and  in  the  other, 
the  lower  portion  of  the  intestinal  track  and  the  environment 
external  to  the  host,  the  microbe  is  not  exposed  to  lactose 
and  there  is  a  low  demand  for  lac  expression.  The  average 
time  to  complete  a  cycle  through  these  two  environments  is 
defined  as  the  cycle  time,  C,  and  the  average  fraction  of  the 
cycle  time  spent  in  the  high-demand  environment  is  defined 
as  the  demand  for  gene  expression,  D  [Fig.  1 1(c)]. 

The  implications  for  gene  expression  of  mutant  and 
wild-type  operons  in  the  high-  and  low-demand  environ¬ 
ments  are  as  follows.  The  wild-type  functions  by  turning  on 
expression  in  the  high-demand  environment  and  turning  off 
expression  in  the  low-demand  environment.  The  mutant  with 
a  defective  promoter  is  unable  to  turn  on  expression  in  either 
environment.  The  mutant  with  a  defective  modulator  (or  de¬ 
fective  regulator  protein)  is  unable  to  turn  off  expression 
regardless  of  the  environment.  The  double  mutant  with  de¬ 
fects  in  both  promoter  and  modulator/regulator  behaves  like 
the  promoter  mutant  and  is  unable  to  turn  on  expression  in 
either  environment. 

The  sizes  of  the  populations  are  affected  by  the  transfer 
rate  between  populations,  which  is  the  result  of  mutation, 
and  by  the  growth  rate,  which  is  the  result  of  overall  fitness. 
The  transfer  rates  depend  on  the  mutation  rate  per  base  and 
on  the  size  of  the  relevant  target  sequence.  The  growth  rate 
for  the  wild  type  is  greater  than  that  for  mutants  of  the  modu¬ 
lator  type  in  the  low-demand  environment;  these  mutants  are 
selected  against  because  of  their  superfluous  expression  of  an 
unneeded  function.  The  growth  rate  for  the  wild  type  is 
greater  than  that  for  mutants  of  the  promoter  type  in  the 
high-demand  environment;  these  mutants  are  selected 
against  because  of  their  inability  to  express  the  needed  func¬ 
tion. 

Solution  of  the  dynamic  equations  for  the  populations 
cycling  through  the  two  environments  yields  expressions  in 
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FIG.  11.  Life  cycle  of  Escherichia  coli  and  the  demand  for  expression  of  its 
lac  operon.  (a)  Schematic  diagram  of  the  upper  (high  demand)  and  lower 
(low  demand)  portions  of  the  human  intestinal  track,  (b)  Life  cycle  consists 
of  repeated  passage  between  environments  with  high-  and  low-demand  for 
lac  gene  expression,  (c)  Definition  of  cycle  time  C  and  demand  for  gene 
expression  D.  (d)  Region  in  the  C  vs  D  plot  for  which  selection  of  the 
wild-type  control  mechanism  is  possible,  (e)  Rate  of  selection  as  a  function 
of  demand.  (0  Extent  of  selection  as  a  function  of  demand.  See  text  for 
discussion. 

C  and  D  for  the  threshold,  extent,  and  rate  of  selection  that 
apply  to  the  wild-type  control  mechanism.45  The  threshold 
for  selection  is  given  by  the  boundary  of  the  shaded  region  in 
Fig.  11(d);  only  systems  with  values  of  C  and  D  that  fall 
within  this  region  are  capable  of  being  selected.  The  rate  and 
extent  of  selection  shown  in  Figs.  11(e)  and  11(f)  exhibit 
optimum  values  for  a  specific  value  of  D. 

Application  of  this  quantitative  demand  theory  to  the  lac 
operon  of  E.  coli  yields  several  new  and  provocative  predic¬ 
tions  that  relate  genotype  to  phenotype.46  The  straight  line  in 
Fig.  11(d)  represents  the  inverse  relationship  C=3/D  that 
results  from  fixing  the  time  of  exposure  to  lactose  at  3  h, 
which  is  the  clinically  determined  value  for  humans  47,48  The 
intersections  of  this  line  with  the  two  thresholds  for  selection 
provide  lower  and  upper  bounds  on  the  cycle  time.  The 
lower  bound  is  approximately  24  h,  which  is  about  as  fast  as 
the  microbe  can  cycle  through  the  intestinal  track  without 
colonization 49-51  The  upper  bound  is  approximately  70 
years,  which  is  the  longest  period  of  colonization  without 
cycling  and  corresponds  favorably  with  the  maximum  life 
span  of  the  host.52  The  optimum  value  for  the  cycle  time,  as 
determined  by  the  optimum  value  for  demand  [from  Figs. 
11(e)  and  11(f)],  is  approximately  four  months,  which  is 
comparable  to  the  average  rate  of  recolonization  measured  in 
humans.53-55  A  summary  of  these  results  is  given  in  Table  V. 

2.  Logic  unit  and  phasing  of  lac  control 

The  analysis  in  Sec.  IV  D  1  assumed  that  when  E.  coli 
was  growing  on  lactose  there  was  no  other  more  preferred 
carbon  source  present.  Thus,  the  positive  CAP-cAMP 
regulator56  was  always  present,  and  we  could  then  concen- 


TABLE  V.  Summary  of  experimental  data  and  model  predictions  based  on 
conditions  for  selection  of  the  lac  operon  in  Escherichia  coli. 


Characteristic 

Experimental  data 

Model  predictions 

Intestinal  transit  time 

12-48  h 

26  h 

Lifetime  of  the  host 

120  years 

66  years 

Re-colonization  rate 

2-18  months 

4  months 

trate  on  the  conditions  for  selection  of  the  specific  control  by 
Lac  repressor.  This  was  a  simplifying  assumption;  in  the 
more  general  situation,  both  the  specific  control  by  Lac  re¬ 
pressor  and  the  global  control  by  CAP-cAMP  activator  must 
be  taken  into  consideration.  The  analysis  becomes  more 
complex,  but  it  follows  closely  the  outline  of  the  simpler 
case  in  Sec.  IV  D  1. 

By  extension  of  the  definition  for  demand  D,  given  in 
Sec.  IV  D  1 ,  one  can  define  a  period  of  demand  for  the  ab¬ 
sence  of  repressor  G,  a  period  of  demand  for  the  presence  of 
activator  E ,  and  a  phase  relationship  between  these  two  pe¬ 
riods  of  demand  F.  By  extension  of  the  analysis  in  Sec. 
I V  D  1 ,  solution  of  the  dynamic  equations  for  wild-type  and 
mutant  populations  cycling  through  the  two  environments 
yields  expressions  in  C,  G,  £,  and  F  for  the  threshold,  extent, 
and  rate  of  selection  that  apply  to  the  wild-type  control 
mechanism. 

The  threshold  for  selection  is  now  an  envelope  surround¬ 
ing  a  “mound”  in  four-dimensional  space  with  cycle  time  C 
as  a  function  of  the  three  parameters  G,  E ,  and  F ;  only  sys¬ 
tems  with  Values  that  fall  within  this  envelope  are  capable  of 
being  selected.  The  rate  and  extent  of  selection  exhibit  opti¬ 
mum  values  as  before,  but  these  now  occur  with  a  specific 
combination  of  values  for  G,  £,  and  F.  The  values  of  G,  £, 
and  F  that  yield  the  optima  represent  a  small  period  when 
repressor  is  absent,  an  even  smaller  period  when  activator  is 
absent,  and  a  large  phase  period  between  them.  The  period 
when  repressor  is  absent  corresponds  to  the  period  of  expo¬ 
sure  to  lactose  (—0.36%  of  the  cycle  time).  Within  this  pe¬ 
riod  (but  shifted  by  —0.20%  of  the  cycle  time)  there  is  a 
shorter  period  when  activator  is  absent  (—0.14%  of  the  cycle 
time);  this  corresponds  to  the  presence  of  a  more  preferred 
carbon  source  that  lowers  the  level  of  cAMP. 

These  relationships  can  be  interpreted  in  terms  of  expo¬ 
sure  to  lactose,  exposure  to  glucose,  and  expression  of  the 
lac  operon  as  shown  in  Fig.  12.  As  E.  coli  enters  a  new  host, 
passes  through  the  early  part  of  the  intestinal  track,  and  is 
exposed  to  lactose,  the  lac  operon  is  induced  and  the  bacteria 
are  able  to  utilize  lactose  as  a  carbon  source.  During  this 
period  the  operator  site  of  the  lac  operon  is  free  of  the  Lac 
repressor.  At  the  point  in  the  small  intestine  where  the  host’s 
lactase  enzymes  are  localized,  lactose  is  actively  split  into  its 
constituent  sugars,  glucose  and  galactose.  This  creates  a 
rapid  elevation  in  the  concentration  of  these  sugars  in  the 
environment  of  E.  coli.  A  period  of  growth  on  glucose  is 
initiated,  and  this  is  accompanied  by  catabolite  repression 
and  lactose  exclusion  from  the  bacteria.  During  this  period 
the  initiator  site  of  the  lac  operon  is  free  of  the  CAP-cAMP 
activator,  transcription  of  the  lac  operon  ceases,  and  the  con¬ 
centration  of  /?-galactosidase  is  diluted  by  growth.  During 
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FIG.  12.  Optimal  duration  and  phasing  of  the  action  by  the  positive  (CAP- 
cAMP)  and  negative  (LacI)  regulators  of  /8-galactosidase  expression.  The 
signal  on  the  top  line  represents  the  absence  of  repressor  binding  to  the  lac 
operator  site,  the  signal  on  the  second  line  represents  activator  binding  to  the 
lac  initiator  site.  The  cycle  time  C  is  the  period  between  the  vertical  lines, 
and  the  relative  phasing  is  shown  as  F.  An  expanded  view  of  the  critical 
region  gives  an  interpretation  in  terms  of  exposure  to  lactose  and  glucose  as 
bacteria  pass  the  site  of  the  lactase  enzymes  in  the  small  intestine.  See  text 
for  discussion. 


this  period  the  glucose  in  the  intestine  is  also  rapidly  ab¬ 
sorbed  by  the  host.  Eventually,  the  glucose  is  exhausted,  the 
CAP-cAMP  activator  again  binds  the  initiator  site  of  the  lac 
operon,  and  the  residual  lactose  that  escaped  hydrolysis  by 
the  host’s  lactase  enzymes  causes  a  diminished  secondary 
induction  of  the  bacterial  lac  operon.  Finally,  the  lactose  is 
exhausted,  the  Lac  repressor  again  binds  the  operator  site  of 
the  lac  operon,  and  the  microbe  enters  the  low-demand  en¬ 
vironment  and  colonizes  the  host. 

The  quantitative  version  of  demand  theory  integrates  in¬ 
formation  at  the  level  of  DNA  (mutation  rate,  effective  target 
sizes  for  mutation  of  regulatory  proteins,  promoter  sites,  and 
modulator  sites),  physiology  (selection  coefficients  for  super¬ 
fluous  expression  of  an  unneeded  function  and  for  lack  of 
expression  of  an  essential  function),  and  ecology  (environ¬ 
mental  context  and  life  cycle)  and  makes  rather  surprising 
predictions  connected  to  the  intestinal  physiology,  life  span, 
and  recolonization  rate  of  the  host.  There  is  independent  ex¬ 
perimental  data  to  support  each  of  these  predictions. 

Finally,  when  the  logic  of  combined  control  by  CAP- 
cAMP  activator  and  Lac  repressor  was  analyzed,  we  found 
an  optimum  set  of  values  not  only  for  the  exposure  to  lac¬ 
tose,  but  also  for  the  exposure  to  glucose  and  for  the  relative 
phasing  between  these  periods  of  exposure.  The  phasing  pre¬ 
dicted  is  consistent  with  the  spatial  and  temporal  environ¬ 
ment  created  by  the  patterns  of  disaccharide  hydrolysis  and 
monosaccharide  absorption  along  the  intestinal  tract  of  the 
host. 

V.  DISCUSSION 

Although  biological  principles  that  govern  some  varia¬ 
tions  in  design  have  been  identified  (e.g.,  positive  vs  nega¬ 
tive  modes  of  control),  there  are  other  well -documented  (and 


many  not  so  well-documented)  variations  in  design  that  still 
are  not  understood.  For  example,  why  is  the  positive  mode  of 
control  in  some  cases  realized  with  an  activator  protein  that 
facilitates  transcription  of  genes  downstream  of  a  promoter, 
and  in  other  cases  with  an  antiterminator  protein  that  facili¬ 
tates  transcription  of  genes  downstream  of  a  terminator? 
There  are  many  examples  of  each,  but  no  convincing  expla¬ 
nation  for  the  difference.  Thus,  the  elements  of  design  and 
the  variations  I  have  reviewed  in  Sec.  II  provide  only  a  start; 
there  is  much  to  be  done  in  this  area. 

For  the  comparative  analysis  of  alternative  designs  we 
require  a  formalism  capable  of  representing  diverse  designs, 
tractable  methods  of  analysis  for  characterizing  designs,  and 
a  strategy  for  making  well-controlled  comparisons  that  re¬ 
veal  essential  differences  while  minimizing  extraneous  dif¬ 
ferences.  As  reviewed  in  Sec.  Ill,  there  are  several  arguments 
that  favor  the  power-law  formalism  for  representing  a  wide 
spectrum  of  nonlinear  systems.  In  particular,  the  local 
S-system  representation  within  this  formalism  not  only  pro¬ 
vides  reasonably  accurate  descriptions  but  also  possesses  a 
tractable  structure,  which  allows  explicit  solutions  for  the 
steady  state  and  efficient  numerical  solutions  for  the  dynam¬ 
ics.  Explicit  steady-state  solutions  are  used  to  make  math¬ 
ematically  controlled  comparisons.  Constraining  these  solu¬ 
tions  provides  invariants  that  eliminate  extra  degrees  of 
freedom,  which  otherwise  would  introduce  extraneous  differ¬ 
ences  into  the  comparison  of  alternatives.  The  ability  to  pro¬ 
vide  such  invariants  is  one  of  the  principle  advantages  of 
using  the  local  S-system  representation.  Two  other  formal¬ 
isms  with  this  property  are  the  linear  representation  and  the 
Volterra-Lotka  representation,  which  is  equivalent  to  the 
linear  representation  for  the  steady  state.  However,  these  rep¬ 
resentations  yield  linear  relations  between  variables  in  steady 
state,  which  is  less  appropriate  for  biological  systems  in 
which  these  relationships  are  typically  nonlinear. 

The  utility  of  these  methods  for  studying  alternative  de¬ 
signs  ultimately  will  be  determined  by  the  degree  to  which 
their  predictions  are  supported  by  experimental  evidence. 
For  this  reason  it  is  important  that  the  methods  consider  an 
entire  class  of  systems  without  specifying  numerical  values 
for  the  parameters,  which  often  are  unknown  in  any  case. 
Predictions  achieved  with  this  approach  then  can  be  tested 
against  numerous  examples  provided  by  members  of  the 
class.  If  the  methods  were  to  focus  upon  a  single  system  with 
specific  values  for  its  parameters,  then  there  would  be  only 
the  one  example  to  test  any  hypothesis  that  might  be  con¬ 
ceived.  The  symbolic  approach  also  allows  one  to  compare 
efficiently  many  alternatives  including  ones  that  no  longer 
exist  (and  so  values  of  their  parameters  will  never  be 
known),  which  often  is  the  case  in  trying  to  account  for  the 
evolution  of  a  given  design,  or  that  hypothetically  might  be 
brought  into  existence  through  genetic  engineering.  The  four 
design  principles  reviewed  in  Sec.  IV  illustrate  the  types  of 
results  that  have  been  obtained  when  the  methods  in  Sec.  Ill 
are  applied  to  some  of  the  elements  of  design  described  in 
Sec.  II. 

First,  we  examined  the  two  modes  of  control  in  elemen¬ 
tary  gene  circuits  (Sec.  IV  A).  Qualitative  arguments  and  ex¬ 
amples  were  used  to  demonstrate  the  validity  of  demand 
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theory  for  the  regulator-modulator  component  of  control 
mechanisms.  The  same  approach  also  can  be  used  to  account 
for  the  alternative  forms  of  the  promoter  component.  In  ei¬ 
ther  case,  the  qualitative  arguments  are  based  on  extreme 
cases  where  the  demand  is  clearly  high  or  low.  One  would 
like  to  quantify  what  is  meant  by  demand,  to  know  how  high 
it  must  be  to  select  for  the  positive  mode  of  control  or  a 
low-level  promoter,  and  to  know  how  low  it  must  be  to  se¬ 
lect  for  the  negative  mode  of  control  or  a  high-level  pro¬ 
moter.  The  quantitative  version  of  demand  theory  reviewed 
in  Sec.  IV  D  specifically  addresses  these  issues. 

Second,  we  examined  the  three  patterns  of  coupling  in 
elementary  gene  circuits  (Sec.  IV  B).  It  was  their  dynamic 
properties  that  proved  to  be  distinctive.  Establishing  the  dy¬ 
namic  differences  required  efficient  numerical  solutions  of 
the  differential  equations  and  a  means  to  reduce  the  dimen¬ 
sion  of  the  search  in  order  to  explore  fully  the  parameter 
space.  The  results  in  Sec.  IV  B  illuminate  an  area  of  experi¬ 
mental  work  that  needs  greater  attention.  For  example,  the 
data  in  Fig.  9  were  obtained  from  individual  gene  circuits  as 
a  result  of  labor-intensive  studies  designed  for  purposes 
other  than  quantitative  characterization  of  the  steady-state 
induction  characteristics  for  effector  and  regulator  cascades. 
The  data  often  are  sketchy  and  subject  to  large  errors,  par¬ 
ticularly  in  the  case  of  regulator  proteins,  which  generally  are 
expressed  at  very  low  levels.  Genomic  and  proteomic  ap¬ 
proaches  to  the  measurement  of  expression  should  provide 
data  for  a  much  larger  number  of  elementary  gene  circuits. 
However,  these  approaches  also  have  difficulty  measuring 
low  levels  of  expression,  and  so  technical  improvements  will 
be  needed  before  they  will  be  able  to  quantify  expression  of 
regulator  genes. 

Third,  we  examined  various  forms  of  connectivity  that 
link  the  inducer  to  the  transcription  unit  for  an  inducible 
catabolic  pathway  and  showed  that  two  different  types  of 
switching  behavior  result  (Sec.  IV  C).  The  analysis  of  lac 
circuitry  in  this  regard  focused  attention  on  a  long-standing 
misconception  in  the  literature,  namely,  that  lac  operon  ex¬ 
pression  normally  is  an  all-or-none  phenomenon.  While  con¬ 
tinuously  variable  induction  of  the  lactose  operon  might  be 
appropriate  for  a  catabolic  pathway  whose  expression  can 
provide  benefits  to  the  cell  even  when  partially  induced,  a 
discontinuous  induction  with  hysteresis  might  be  more  ap¬ 
propriate  for  major  differentiation  events  that  require  a  defi¬ 
nite  commitment  at  some  point.  The  wider  the  hysteretic 
loop  the  greater  the  degree  of  commitment.  The  width  of  the 
loop  tends  to  increase  with  a  large  capacity  for  induction 
(ratio  of  maximum  to  basal  level  of  expression),  high  loga¬ 
rithmic  gain  in  the  regulatable  region  (high  degree  of  coop- 
erativity),  and  substrates  for  the  enzymes  in  the  pathway  op¬ 
erating  as  near  saturation  as  compatible  with  switching. 

Fourth,  we  examined  the  context  of  gene  expression  and 
developed  a  quantitative  version  of  demand  theory  (Sec. 
IV  D).  In  addition  to  providing  a  quantitative  measure  of 
demand,  the  results  define  what  high  and  low  mean  in  terms 
of  the  level  of  demand  required  to  select  for  the  positive  or 
the  negative  mode  of  control  and  for  low-  or  high-level  pro¬ 
moters.  This  analysis  also  predicted  new  and  unexpected 
kinds  of  information,  such  as  intestinal  transit  time,  host  life¬ 


time,  and  recolonization  rate.  When  the  logic  unit  involving 
the  two  relevant  regulators  was  included  in  the  analysis  it 
also  yielded  predictions  for  the  relative  phasing  of  the  envi¬ 
ronmental  cues  involved  in  lac  operon  induction. 

Is  there  anything  common  to  these  successful  explana¬ 
tions  of  design  that  might  be  useful  as  a  guide  in  exploring 
other  variations  in  design?  Two  such  features  come  to  mind. 
First,  each  of  the  examples  involved  a  limited  number  of 
possible  variations  on  a  theme:  two  modes  of  control,  three 
patterns  of  coupling,  two  types  of  switches.  This  meant  that 
only  a  small  number  of  cases  had  to  be  analyzed  and  com¬ 
pared,  which  is  a  manageable  task.  If  there  had  been  many 
variations  in  each  case,  then  one  would  have  no  hope  of 
finding  a  simple  underlying  rule  that  could  account  for  all  the 
variations,  and  one  might  never  have  considered  analyzing 
and  comparing  all  of  the  possibilities.  Second,  each  case 
could  be  represented  by  a  set  of  simple  equations  whose 
structure  allowed  symbolic  analysis  (and  exhaustive  numeri¬ 
cal  analysis  when  necessary).  This  permitted  the  use  of  con¬ 
trolled  mathematical  comparisons,  which  led  to  the  identifi¬ 
cation  of  clear  qualitative  differences  in  the  behavior  of  the 
alternatives.  Thus,  it  might  prove  fruitful  in  the  future  to  look 
for  instances  where  these  features  present  themselves. 

In  this  context,  we  must  acknowledge  the  fundamental 
role  of  accident  in  generating  the  diversity  that  is  the  sub¬ 
strate  for  natural  selection.  Thus,  there  undoubtedly  will  be 
examples  of  recently  generated  variations  in  design  for 
which  there  will  be  no  rational  explanation.  Only  in  time  will 
natural  selection  tend  to  produce  designs  that  are  shaped  for 
specific  functions  and  hence  understandable  in  principle. 

Finally,  will  the  understanding  of  large  gene  networks 
require  additional  tools  beyond  those  needed  for  elementary 
gene  circuits?  Although  we  have  no  general  answer  to  this 
question,  there  are  three  points  having  to  do  with  network 
connectivity,  catalytic  versus  stoichiometric  linkages,  and 
time-scale  separation  that  are  worthy  of  comment. 

First,  the  evidence  suggests,  at  least  for  bacteria,  that 
there  are  relatively  few  connections  between  elementary 
gene  circuits  (see  Sec.  II D).  This  probably  explains  the  ex¬ 
perimental  success  that  has  been  obtained  by  studying  the 
regulation  of  isolated  gene  systems.  Had  there  been  rich  in¬ 
teractions  among  these  gene  systems,  such  studies  might 
have  been  less  fruitful.  Low  connectivity  also  suggests  that 
the  understanding  of  elementary  circuits  may  largely  carry 
over  to  their  role  in  larger  networks  and  that  the  same  tools 
might  be  used  to  study  larger  networks. 

Second,  catalytic  linkages  between  circuits  are  less  prob¬ 
lematic  then  stoichiometric  linkages,  at  least  for  the  analysis 
of  steady-state  behavior.  Elementary  circuits  can  be  linked 
catalytically  without  their  individual  properties  changing  ap¬ 
preciably,  because  the  molecules  in  one  circuit  acting  cata¬ 
lytically  on  another  circuit  are  not  consumed  in  the  process 
of  interaction.  Such  a  circuit  can  have  a  unilateral  effect  on  a 
second  circuit,  without  having  its  own  behavior  affected  in 
the  process.  This  permits  a  modular  block-diagram  treat¬ 
ment,  which  makes  use  of  the  results  obtained  for  the  indi¬ 
vidual  circuits  in  isolation,  to  characterize  the  larger  net¬ 
work.  (This  is  analogous  to  the  well-known  strategy  used  by 
electronic  engineers,  who  design  operational  amplifiers  with 
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high  impedance  to  insulate  the  properties  of  the  modules 
being  coupled.)  On  the  other  hand,  elementary  circuits  that 
are  linked  stoichiometrically  may  not  be  treatable  in  this 
fashion,  because  the  molecules  in  one  circuit  are  consumed 
in  the  process  of  interacting  with  a  second  circuit.  This  is  a 
much  more  intimate  linkage  that  may  require  the  two  circuits 
to  be  studied  as  a  whole.  In  either  case,  the  dynamic  proper¬ 
ties  are  not  easily  combined  in  general  because  the  circuits 
are  nonlinear. 

Third,  the  separation  of  time  scales  allows  some  elemen¬ 
tary  circuits  to  be  represented  by  transfer  functions  consist¬ 
ing  of  a  simple  power-law  function.  (Allometric  relation¬ 
ships  are  an  example  of  this.)  This  is  related  to  the  telescopic 
property  of  the  S-system  representation  that  was  mentioned 
in  Sec.  Ill  B  1  -  This  property  allows  a  simple  block-diagram 
treatment  of  the  elementary  circuits  that  operate  on  a  fast 
time  scale. 
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ABSTRACT  We  have  determined  the  effects  of  control  by  overall  feedback  inhibition  on  the  systemic  behavior  of  un¬ 
branched  metabolic  pathways  with  an  arbitrary  pattern  of  other  feedback  inhibitions  by  using  a  recently  developed  numerical 
generalization  of  Mathematically  Controlled  Comparisons,  a  method  for  comparing  the  function  of  alternative  molecular 
designs.  This  method  allows  the  rigorous  determination  of  the  changes  in  systemic  properties  that  can  be  exclusively 
attributed  to  overall  feedback  inhibition.  Analytical  results  show  that  the  unbranched  pathway  can  achieve  the  same 
steady-state  flux,  concentrations,  and  logarithmic  gains  with  respect  to  changes  in  substrate,  with  or  without  overall  feedback 
inhibition.  The  analytical  approach  also  shows  that  control  by  overall  feedback  inhibition  amplifies  the  regulation  of  flux  by  the 
demand  for  end  product  while  attenuating  the  sensitivity  of  the  concentrations  to  the  same  demand.  This  approach  does  not 
provide  a  clear  answer  regarding  the  effect  of  overall  feedback  inhibition  on  the  robustness,  stability,  and  transient  time  of  the 
pathway.  However,  the  generalized  numerical  method  we  have  used  does  clarify  the  answers  to  these  questions.  On  average, 
an  unbranched  pathway  with  control  by  overall  feedback  inhibition  is  less  sensitive  to  perturbations  in  the  values  of  the 
parameters  that  define  the  system.  The  difference  in  robustness  can  range  from  a  few  percent  to  fifty  percent  or  more, 
depending  on  the  length  of  the  pathway  and  on  the  metabolite  one  considers.  On  average,  overall  feedback  inhibition 
decreases  the  stability  margins  by  a  minimal  amount  (typically  less  than  5%).  Finally,  and  again  on  average,  stable  systems 
with  overall  feedback  inhibition  respond  faster  to  fluctuations  in  the  metabolite  concentrations.  Taken  together,  these  results 
show  that  control  by  overall  feedback  inhibition  confers  several  functional  advantages  upon  unbranched  pathways.  These 
advantages  provide  a  rationale  for  the  prevalence  of  this  control  mechanism  in  unbranched  metabolic  pathways  in  vivo. 


INTRODUCTION 

Biochemical  control  systems  have  been  studied  for  more 
than  45  years.  The  discovery  of  control  by  molecular  feed¬ 
back  inhibition  in  biochemical  pathways  was  initially  made 
in  unbranched  biosynthetic  pathways  (Umbarger,  1956; 
Yates  and  Pardee,  1956).  In  these  pathways,  the  most  com¬ 
mon  pattern  of  control  is  inhibition  of  the  initial  reaction  by 
the  final  product  of  the  pathway  (end-product  inhibition  or 
overall  feedback  inhibition). 

There  are  several  criteria  for  the  functional  effectiveness 
of  control  in  such  pathways  that  can  be  used  to  evaluate  the 
biological  significance  of  the  overall  feedback  inhibition 
mechanism.  A  biochemical  pathway  should  be  robust,  i.e.,  it 
should  function  reproducibly  despite  perturbations  in  the 
values  of  the  parameters  that  define  the  structure  of  the 
system.  The  operating  point  (state)  of  the  system  should  be 
stable  so  that  the  system  returns  to  the  steady  state  following 
small  random  fluctuations  in  the  values  of  the  dependent 
variables;  if  not,  the  system  tends  to  be  dysfunctional  be¬ 
cause  spurious  environmental  fluctuations  will  lead  to  loss 
of  the  steady  state.  The  flux  through  the  pathway  should  be 
responsive  to  changes  in  the  demand  for  the  final  product. 
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This  ensures  that  the  amount  of  material  flowing  through 
the  pathway  is  intimately  coupled  to  the  metabolic  needs  of 
the  cell.  Finally,  the  system  should  be  temporally  responsive 
to  changes,  because,  otherwise,  the  system  is  unlikely  to  be 
competitive  in  rapidly  changing  environments.  [A  more 
extensive  discussion  of  these  criteria  and  their  quantifica¬ 
tion  can  be  found  in  Savageau  (1976)  and  Hlavacek  and 
Savageau  (1997).] 

There  have  been  several  studies  focused  on  the  effect  of 
control  by  overall  feedback  inhibition  on  the  stability  of 
unbranched  pathways.  In  general,  the  first  enzyme  of  the 
pathway  is  considered  to  be  allosteric,  whereas  the  others 
are  considered  to  be  Michaelian  (e.g.,  Goodwin,  1963; 
Morales  and  McKay,  1967;  Walter,  1969a,b,  1970; 
Viniegra-Gonzalez,  1973;  Hunding,  1974;  Rapp,  1976;  Di- 
brov  et  al.,  1981).  The  stability  of  an  unbranched  pathway 
with  overall  feedback  inhibition  and  enzymes  confined  to 
one  of  two  spatial  compartments  with  diffusion  between 
compartments  has  been  studied  by  Costalat  and  Burger 
(1996).  They  found  that  stability  can  be  increased  by  this 
type  of  compartmentation.  These  studies  considered  path¬ 
ways  with  no  internal  feedback  inhibitions. 

Several  other  patterns  involving  control  by  inhibitory 
feedback  can,  in  principle,  perform  the  same  qualitative 
functions  as  overall  feedback  inhibition.  One  such  pattern 
is,  for  example,  a  sequence  of  feedback  inhibitions  in  which 
each  intermediate  inhibits  the  reaction  that  immediately 
precedes  it  (Koch,  1967).  Other  patterns  of  internal  feed¬ 
back  inhibition  can  be  found  by  searching  either  the  litera- 
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ture  or  some  of  the  databases  for  metabolism  that  are 
burgeoning  on  the  world  wide  web  (e.g.,  KEEG:  http://www. 
genome.ad.jp/kegg/;  ECOCYC:  http://ecocyc.PangeaSystems. 
com/ecocyc/server.html;  PUMA:  http://www.unix.mcs.anl. 
gov/compbio/PUMA/Production/puma_graphics.html;  EMP: 
http://wit.mcs.anl.g0v//EMP/).  However,  even  when  inter¬ 
mediate  feedback  inhibition  patterns  exist,  control  by  over¬ 
all  feedback  inhibition  remains  a  prevalent  theme  in  bio¬ 
synthetic  pathways. 

Savageau  (1972,  1974,  1975,  1976)  studied  the  function 
of  various  patterns  of  feedback  inhibition  and  explained  the 
prevalence  of  control  by  overall  feedback  inhibition  by 
using  arguments  based  on  selection.  He  assumed  that  the 
design  of  a  pathway  is  selected  to  optimize  certain  systemic 
characteristics,  and  then  compared  those  systemic  charac¬ 
teristics  in  unbranched  pathways  with  overall  feedback  in¬ 
hibition  to  the  same  characteristics  in  pathways  with  alter¬ 
native  inhibitory  feedback  designs.  He  showed  that  the 
pathway  with  control  by  overall  feedback  inhibition  is  more 
robust,  i.e.,  less  sensitive  to  perturbations  in  parameter 
values  than  the  pathway  with  many  alternative  designs 
(Savageau,  1974). 

The  stability  of  cases  with  control  by  internal  feedback 
inhibitions  has  also  been  examined  (e.g.,  Savageau,  1976; 
Thron,  1991a,b;  Demin  and  Kholodenko,  1993).  These  au¬ 
thors  found  that  systems  with  internal  feedback  inhibitions 
have  larger  stability  margins  than  systems  without  these 
interactions.  They  also  determined  that,  for  systems  without 
internal  feedback  inhibition,  control  by  overall  feedback 
inhibition  decreases  the  stability  margins  of  the  pathway. 


In  this  paper,  we  consider  unbranched  pathways  with  all 
possible  patterns  of  internal  feedback  inhibitions  (the  “fully- 
wired”  case)  and  use  all  of  the  criteria  mentioned  above  to 
determine  the  biological  significance  of  control  by  overall 
feedback  inhibition  in  such  pathways.  We  use  a  technique 
called  Mathematically  Controlled  Comparison  that  was 
originally  developed  to  determine  irreducible  qualitative 
differences  in  systemic  behavior  of  models  with  alternative 
regulatory  designs  for  the  same  network  of  reactions  (Sav¬ 
ageau,  1972,  1976;  Irvine  and  Savageau,  1985).  This  qual¬ 
itative  technique  requires  the  existence  of  closed-form  so¬ 
lutions  for  the  steady  state.  Such  solutions  can  be  obtained 
by  using  the  local  S-system  representation  to  characterize 
the  pathway  of  interest.  Important  functional  constraints  are 
introduced  by  equating  relevant  steady-state  properties  of 
the  alternative  systems  being  compared.  The  limitations  of 
this  technique  have  been  overcome  by  a  recently  developed 
generalization  that  uses  numerical  methods  to  obtain  results 
that  are  general  in  a  statistical  sense  (Alves  and  Savageau, 
2000a). 

METHODS 

Alternative  models  and  key  systemic  properties 

Consider  the  unbranched  pathways  depicted  in  Fig.  1.  The  independent 
variable  Xn+,  represents  the  cell’s  demand  for  the  end  product  Xn.  If  the 
cell  requires  large  amounts  of  Xn,  then  the  value  of  Xn+,  will  be  high;  if 
small  amounts  of  Xn  are  required,  then  the  value  of  Xn  +  ,  will  be  low.  The 
dynamic  behavior  of  such  systems  can  be  described  in  principle  by  a  set  of 
ordinary  differential  equations.  There  is  no  generic  representation  of  these 


A 


xn+1 


FIGURE  1  (A)  Model  of  an  unbranched  pathway  with  all  possible  inhibitory  feedback  interactions  (reference  model),  (fl)  Model  of  an  unbranched 

pathway  with  all  possible  inhibitory  feedback  interactions  except  overall  feedback  inhibition  (alternative  model).  The  horizontal  arrows  represent 
biochemical  reactions,  whereas  the  vertical  arrows  represent  inhibitory  feedback  interactions. 
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equations  that  can  provide  a  globally  accurate  description  of  the  behavior 
[see  Appendix].  However,  the  set  of  equations  can  be  approximated  to  the 
first  order  in  logarithmic  space  (Savageau,  1969),  yielding  ordinary  differ¬ 
ential  equations  with  the  canonical  form  of  an  S-system  (Savageau,  1996). 
This  representation  has  a  solid  theoretical  foundation  based  on  Taylor’s 
theorem.  Thus,  the  validity  of  the  results  is  guaranteed  within  some 
neighborhood  of  the  nominal  steady-state  operating  point.  The  size  of  this 
neighborhood  cannot  be  specified  in  general,  because  it  depends  on  the 
characteristics  of  each  individual  system. 

For  pathways  with  n  intermediates,  the  general  case  in  which  all 
possible  feedback  inhibitions  exist  (Fig.  1  A)  can  be  described  in  the  local 
S-system  representation  as 

^=a1rixT-a2flXf,  (1) 

j=0  j^l 

=«i  n  xfj-ai+1fi*r',  (2> 

j=i — 1  j=i 

iy  n  n+l 

-jf-a.  n  *r-«„+1  n*r,j.  <3> 

j-n-l  i=n 

The  corresponding  case  without  overall  feedback  inhibition  (Fig.  1  B)  can 
be  described  by  the  same  set  of  equations,  except  that  Eq.  1  is  replaced  by 


^=a|  n*?5— «2flxf.  (4) 

j=0  j=l 

The  rate  law  for  each  reaction  is  represented  by  a  simple  product  of 
power-law  functions.  The  values  for  the  parameters  in  this  representation 
can  be  determined  directly  from  conventional  experimental  measurements 
of  initial  rate  as  a  function  of  reactant  and  modifier  concentrations  (Sav¬ 
ageau,  1976).  The  range  of  values  for  the  concentrations  is  chosen  to 
sample  the  region  about  the  nominal  steady  state  of  interest. 

The  parameters  are  defined  according  to  Taylor's  theorem  as 


n 

O.  =  Vh,  n  V’  (6) 

j=i-l 


where  the  additional  subscript  zero  signifies  that  the  variables  and  their 
derivatives  are  evaluated  at  the  steady-state  operating  point.  The  definition 
of  these  parameters  allows  them  to  be  directly  related  to  the  parameters  in 
other  representations  such  as  the  traditional  Michaelis-Menten  represen¬ 
tation.  In  the  simplest  case  of  the  Hill  rate  law. 


rM+X" 


(7) 


[and  the  irreversible  Michaelis-Menten  rate  law  (n  —  1)],  these  relation¬ 
ships  are  well  known  (Savageau,  1971a), 

Km 

8  ft  rrn  t  yn  * 

Am  +  Ao 

a  -  VqXq1. 


(8) 

(9) 


The  multiplicative  parameters,  a,  can  be  interpreted  as  rate  constants 
that  are  always  positive.  The  exponential  parameters,  g,  can  be  interpreted 
as  kinetic  orders  that  represent  the  direct  influence  of  each  intermediate  on 
each  rate  law.  If  Xs  is  directly  involved  in  the  rate  law  Vjt  either  as  a 
substrate  or  a  modulator,  and  if  an  increase  in  Xj  causes  an  increase  in  the 
rate  Vy  then  the  kinetic  order  will  be  positive.  If  an  increase  in  Xx  causes 
a  decrease  in  Vy  then  the  kinetic  order  will  be  negative.  If  Xj  is  not  directly 
involved  in  Vj,  then  the  kinetic  order  will  be  zero.  The  positive  kinetic 
orders  in  Eqs.  1-4  are  gj+u  (0  <  /  £  n)  and  g|0»  because  these  are  the 
kinetic  orders  for  substrates  of  reactions,  and  gn+i,D+j,  which,  together 
with  Xn+1,  represents  the  demand  for  the  end  product  Xn.  The  remaining 
kinetic  orders,  which  represent  feedback  inhibitions,  are  negative. 

At  a  steady  state,  the  rate  of  production  and  the  rate  of  consumption  will 
be  equal  for  each  intermediate,  and  Eqs.  1-3  reduce  to  the  following  matrix 
equation  (Savageau,  1969),  which  can  be  solved  analytically. 


b\  ~  gio^o 

a\ \ 

^ln 

"  y, " 

b? 

0-2\ 

*  ’  a2n 

1 2 

K-\ 

^n-l,n 

r»'-i 

mbn  gn+l,n+ 1 1^1+ 1_ 

_  ^nl 

^nn  _ 

.  y.  J 

where  h,  =  log(a2/a,),  bx  =  log  (a,+ ,/<*,),  ay  =  gi}  -  g1+1J,  and  T,  - 
log(Xj).  Eq.  10  is  linear  and  therefore  easily  solved  to  obtain  the  steady- 
state  value  for  each  T},  and  then  the  corresponding  value  for  each  Xj  is 
obtained  by  simple  exponentiation.  Eqs.  2-4  reduce  to  an  identical  matrix 
equation,  except  that  the  parameters  of  the  first  row  are  primed  and 
Sin  =  0. 

Two  types  of  systemic  coefficients,  logarithmic  gains  and  parameter 
sensitivities,  can  be  used  to  characterize  the  steady  state  of  such  models 
(Shiraishi  and  Savageau,  1992).  Logarithmic  gains  measure  the  relative 
influence  of  each  independent  variable  on  each  dependent  variable  of  the 
integrated  model.  For  example, 


L(X i,  ATo)  = 


dlog(Xj) 
d  log(X0) 


dr0 


(ID 


measures  the  percent  change  in  the  concentration  of  intermediate  X-,  caused 
by  a  percentage  change  in  the  concentration  of  the  initial  substrate  X0. 
Logarithmic  gains  provide  important  information  concerning  the  amplifi¬ 
cation  or  attenuation  of  signals  as  they  are  propagated  through  the  system. 
The  experimental  measurement  of  a  logarithmic  gain  involves  the  deter¬ 
mination  of  steady-states  fluxes  and  concentrations  at  different  values  for 
a  given  independent  variable  (Savageau,  1971a). 

Parameter  sensitivities  measure  the  relative  influence  of  each  parameter 
on  each  dependent  variable  of  the  model.  For  example. 


S(XbPl)  = 


d  log(Xj)  dY, 
d  log(pj)  Pi  d pj 


(12) 


measures  the  percent  change  in  the  concentration  of  intermediate  Xj  caused 
by  a  percentage  change  in  the  value  of  the  parameter  py  Parameter 
sensitivities  provide  important  information  about  system  robustness,  i.e., 
how  sensitive  the  system  is  to  perturbations  in  the  parameters  that  define 
the  structure  of  the  system.  Because  enzymes  usually  have  a  first-order 
influence  on  the  process  they  catalyze,  the  logarithmic  gain  in  flux  and  in 
each  concentration  with  respect  to  change  in  the  concentration  of  each 
enzyme  is  the  same  as  the  sensitivity  in  flux  and  in  each  concentration  with 
respect  to  change  in  the  rate  constant  of  the  corresponding  enzyme.  The 
experimental  measurement  of  a  parameter  sensitivity  involves  the  deter¬ 
mination  of  steady-state  fluxes  and  concentrations  before  and  after  chang¬ 
ing  the  value  of  a  parameter  by  mutation  or  other  means  (Savageau, 
1971b). 

Because  we  can  calculate  closed-form  steady-state  solutions  for  Eqs. 
1-3  and  2-4,  we  can  also  calculate  each  of  the  two  types  of  coefficients 
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simply  by  taking  the  appropriate  derivatives  of  those  solutions.  Although 
the  mathematical  operations  involved  are  the  same  in  each  case,  it  is 
important  to  keep  in  mind  that  the  biological  significance  of  the  two  types 
of  coefficients  is  very  different 

The  local  stability  of  the  steady  state  can  be  determined  by  applying  the 
Routh  criteria  (Dorf,  1992).  The  magnitude  of  the  two  critical  Routh 
conditions  can  be  used  to  quantify  the  margin  of  stability  (e.g.,  Savageau, 
1976). 

The  use  of  the  S-System  formalism  allows  for  an  analytical  study  of  the 
dynamical  systems  at  steady  state.  Comparisons  of  systems  with  only  one 
feedback  inhibition  to  systems  without  feedback  regulation  can  be  done 
and  interpreted  in  a  fully  symbolic  way.  However,  for  comparisons  involv¬ 
ing  many  feedback  inhibitions,  numerical  values  must  be  introduced  for  the 
parameters  to  make  the  comparisons  interpretable.  The  steady-state  behav¬ 
ior  of  the  alternative  models  is  compared  with  respect  to  their  flux, 
intermediate  concentrations,  logarithmic  gains  with  respect  to  changes  in 
initial  substrate  and  demand  for  end  product,  robustness,  and  stability 
margins.  The  differential  equations  also  are  solved  numerically  to  charac¬ 
terize  the  temporal  responsiveness  of  the  alternative  designs.  To  evaluate 
this,  we  increase  the  steady-state  concentration  of  each  X-,  by  20%  and 
measure  the  time  the  system  takes  to  relax  back  to  within  1%  of  its  original 
steady  state. 


of  known  values  for  the  reference  system.  This  condition  also  makes  the 
corresponding  logarithmic  gain  in  flux  the  same  for  the  two  designs. 

Second,  equating  the  concentrations  for  any  one  of  the  metabolites  in 
the  pathway, 

1*  =  )®  i  =  1, 2, . . . ,  h,  (14) 

which  causes  each  of  the  corresponding  intermediates  to  have  the  same 
concentration,  specifies  the  value  of  the  rate  constant  a\.  This  condition 
also  makes  the  flux  the  same  for  the  two  designs. 

Finally,  the  remaining  k  —  1  primed  parameters  are  fixed  by  equating 
the  rate-constant  parameter  sensitivities, 

cKj)^  5(Xj,  aj)B  *  1»  2, . . . ,  n  j ’ =  1 ,  2, . . . ,  tij 

(15) 

for  any  Xl  and  k  -  1  different  rate  constants  ay  Different  results  will  be 
obtained,  depending  upon  which  of  the  parameter  sensitivities  are  not  used 
in  this  procedure. 

For  example,  consider  the  case  in  which  all  n  -  1  intermediates  feed 
back  on  the  first  step  in  the  pathway.  If  the  unconstrained  sensitivity  in  Eq. 
15  is  S(Xn ,  an),  then  the  values  of  the  primed  parameters  are  given  by 


Calculating  constraints  for  the  mathematically 
controlled  comparison 

Only  the  first  step  in  the  pathway  is  allowed  to  differ  between  the  reference 
model  (Fig.  1  A)  and  the  alternative  model  (Fig.  1  B).  Therefore,  to  estab¬ 
lish  “internal  equivalence”  (Savageau,  1972,  1976;  Irvine,  1991)  between 
the  two  designs,  we  require  the  values  for  the  corresponding  parameters  of 
all  other  steps  in  the  two  models  to  be  the  same. 

The  first  step  of  the  pathway  differs  between  the  reference  model  and 
the  alternative  model,  and  the  degrees  of  freedom  associated  with  this 
difference  must  be  eliminated  to  the  extent  possible.  If  we  reason  that  loss 
or  gain  of  an  inhibitory  site  on  the  first  enzyme  comes  about  by  mutation, 
and  that  this  mutation  can  cause  changes  in  all  the  parameters  of  the  first 
reaction,  then  a  mutation  causing  loss  of  overall  feedback  inhibition  would 
change  the  parameters  a,  and  gw  through  gln  in  Eq.  1  to  the  corresponding 
primed  parameters  in  Eq.  4.  Clearly,  the  value  of  the  parameter  gjn,  which 
equals  zero,  differs  from  that  of  gln,  which  is  nonzero.  The  remaining 
primed  parameters  also  will  have  values  that,  in  general,  are  not  equal  to 
the  values  of  the  corresponding  parameters  in  the  reference  model.  Because 
we  wish  to  determine  those  effects  that  are  due  solely  to  changes  in  the 
structure  of  the  system  and  not  simply  to  arbitrary  changes  in  the  values  of 
parameters,  we  shall  specify  values  for  the  primed  parameters  that  mini¬ 
mize  all  other  effects.  This  can  be  accomplished  by  deriving  the  mathe¬ 
matical  expression  for  a  given  steady-state  property  in  each  of  the  two 
models,  equating  these  expressions,  and  then  solving  the  constraint  equa¬ 
tion  for  the  value  of  a  primed  parameter.  This  process  establishes  an 
“external  equivalence”  between  the  two  designs  (Savageau,  1972,  1976; 
Irvine,  1991).  After  values  for  all  the  primed  parameters  have  been  spec¬ 
ified  in  terms  of  the  known  values  for  the  reference  system,  the  extra 
degrees  of  freedom  have  been  eliminated,  and  we  can  proceed  with  the 
comparison. 

Three  classes  of  constraint  equations  are  used  to  fix  the  values  for  the 
k  +  2  primed  parameters  when  there  are  k  interactions  that  feed  back  to  the 
first  step  of  the  alternative  pathway.  These  are  obtained  by  equating 
steady-state  logarithmic  gains,  concentrations,  and  parameter  sensitivities 
as  described  below. 

First,  equating  the  logarithmic  gains  for  any  one  of  the  metabolites  with 
respect  to  change  in  the  initial  substrate, 

L(X j,  X0)a  =  L(Xx,  X0)B  i  =  1,  2, . . . ,  n,  (13) 

which  causes  each  of  the  other  corresponding  intermediates  to  have  the 
same  logarithmic  gain,  specifies  the  value  of  the  kinetic  order  in  terms 


log(ai)  =  log(«|)  +  - .  ^  log(an+1/an),  (16) 

<5n+l,n  6nn 

sip  =  Sip  0  <  p  <  n  —  1,  (17) 

,  |  Sin 

Sl.n-l  Sl,n-1  a  Sn,n-1*  (18) 

6n  +  l,n  «*>nn 

If  the  unconstrained  sensitivity  in  Eq.  15  is  S(Xit  a,),  then  the  values  of  the 
primed  parameters  are 


log(al)  =  log(Qi)  - 

Sn  +  I,n  Sin 


8,JTT  log(a„+)), 


Sn+I,n  Sin 


09) 


gip=„  _p  8 ip  0sp<«-l.  (20) 

62n  6  In 

If  the  unconstrained  sensitivity  in  Eq.  15  is  S(Xt,  a)  where  1  <j  <  «,  then 
the  values  of  the  primed  parameters  are 

log(“!)  =  log(«i)  -  - log(an+t/aj+l),  (21) 

ojn  6j  +  l,n 

Sip  =  Sip  (22) 

#!p  =  SiP  ~  ~~ - S2j  j~1<p.  (23) 

ojn  oj+  l,n 

Because  the  objective  of  a  controlled  comparison  is  to  minimize  the 
differences  between  the  systems  being  compared,  we  chose  the  uncon¬ 
strained  sensitivity  that  leads  to  the  smallest  number  of  systemic  properties 
with  values  that  differ  between  the  reference  system  and  the  alternative 
system.  The  systemic  differences  are  minimized  when  the  unconstrained 
sensitivity  is  S(X 5,  otn+1);  any  other  choice  leads  to  at  least  one  additional 
systemic  property  that  differs  between  the  two  systems. 

If  only  a  subset  of  the  intermediates  feed  back  on  the  first  step  of  the 
pathway,  and  if  we  use  the  constraint  set  that  causes  the  smallest  number 
of  properties  to  be  different  between  systems  A  and  B,  then  each  kinetic 
order  representing  a  feedback  inhibition  has  the  same  value  in  both  models, 
except  for  the  kinetic  order  representing  the  last  intermediate  to  feed  back 
on  the  first  step  of  the  pathway.  In  general, 
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where  Xk  is  the  last  intermediate  to  feed  back  on  the  first  step  of  the 
pathway,  and  A„k  is  a  positive  subdeterminant  of  [A]  that  depends  on  the 
actual  Xk  and  on  the  length  n  of  the  pathway.  The  kinetic  orders  glp  with 
p  <  k  are  the  same  for  both  systems.  As  for  the  rate  constant  a\y  its  general 
form  is 


where  *kp  is  either  a  function  of  the  kinetic  orders  or  zero. 

For  the  special  case  in  which  the  final  product  is  the  only  metabolite  to 
feed  back  on  the  initial  step,  the  primed  parameters  are  given  by 


log(aj)  =  log(ai)„  - log(an+l) 


i>n+l,n  £ln 

£n+l,n 


£n+l,n  <?ln 


8\ o  = 


£n+l,n  £ln 


(26) 

(27) 


This  means  that  g',0  is  always  smaller  than  g10.  (To  contrast  these  results 
with  the  analogous  results  expressed  within  the  Michaelis-Menten  formal¬ 
ism,  see  the  Appendix.) 


Numerical  comparison 

It  is  straightforward  to  compare  analytically  corresponding  magnitudes 
from  each  of  the  two  designs.  For  two-  and  three-step  pathways,  the 
comparisons  are  clearly  interpretable  for  most  systemic  properties.  The 
analytical  results  give  qualitative  information  that  characterizes  the  role  of 
overall  feedback  inhibition  for  the  system  in  Fig.  1  A.  As  the  length  of  the 
pathway  increases,  the  analytical  interpretation  becomes  problematic.  To 
determine  if  a  given  magnitude  is  larger  in  the  reference  system  or  the 
alternative  system  requires  knowledge  of  actual  parameter  values  in  these 
cases  and  a  method,  such  as  that  found  in  Alves  and  Savageau  (2000a), 
for  making  the  numerical  equivalent  of  a  mathematically  controlled 
comparison. 

To  obtain  numerical  information,  one  must  introduce  specific  values  for 
the  parameters  and  compare  systems.  For  this  purpose,  we  have  randomly 
generated  a  large  ensemble  of  parameter  sets  and  selected  5000  of  these 
sets  that  define  systems  consistent  with  various  physical  and  biochemical 
constraints.  These  constraints  include  mass  balance,  low  concentrations  of 
intermediates  and  small  changes  in  their  values  to  minimize  utilization  of 
the  limited  solvent  capacity  in  the  cell,  small  values  for  parameter  sensi¬ 
tivities  so  as  to  desensitize  the  system  to  spurious  fluctuations  affecting  its 
structure,  and  stability  margins  large  enough  to  ensure  local  stability  of  the 
systems.  A  detailed  description  of  these  methods  can  be  found  in  Alves  and 
Savageau  (2000c).  Mathematica  (Wolfram,  1997)  was  used  for  all  the 
numerical  procedures.  Pathways  of  up  to  seven  steps  were  studied  using 
this  numerical  methodology. 

To  interpret  the  ratios  that  result  from  our  analysis,  we  use  density  of 
ratios  plots  as  defined  in  Alves  and  Savageau  (2000b).  The  primary  density 
plots  of  the  raw  data  have  the  magnitude  of  some  property  for  the  reference 
system  on  the  x-axis  and  the  corresponding  ratio  of  magnitudes  (alternative 
system  to  reference  system)  on  the  y-axis.  The  primary  plot  can  be  viewed 
as  a  list  of  5000  paired  values  that  can  be  ordered  with  respect  to  the 
reference  magnitude,  thereby  forming  a  list  Lx  in  which  the  first  pair  has 
the  lowest  measured  value  for  property  P  in  the  reference  model,  the 
second  has  the  second  lowest,  and  so  on. 


Secondary  density  plots  are  constructed  from  the  primary  plots  by  the 
use  of  moving  quantile  techniques  with  a  window  size  of  500.  The 
procedure  is  as  follows.  One  collects  the  first  500  ratios  from  the  list  Llt 
calculates  the  quantile  of  interest  for  this  sample,  and  pairs  this  number  ( R ) 
with  the  median  value  of  the  corresponding  P  values  for  the  reference 
model,  denoted  (P).  One  advances  the  window  by  one  position,  collects 
ratios  2-501,  calculates  (/?),  and  pairs  it  with  the  corresponding  (P)  value 
and  continues  in  this  manner  until  the  last  ratio  from  the  list  L,  was  used 
for  the  first  time  (for  further  explanation  of  moving  median  techniques  see, 
e.g.,  Hamilton,  1994). 

The  slope  in  the  secondary  plot  measures  the  degree  of  correlation 
between  the  quantities  plotted  on  the  x-  and  y-axes.  This  technique  also  is 
used  to  examine  correlations  between  ratios  of  interest  and  other  magni¬ 
tudes  shared  by  the  two  systems,  e.g.,  the  correlation  between  the  ratio  of 
stability  margins  and  the  magnitude  of  a  rate  constant  common  to  the  two 
systems  (for  traditional  applications  of  correlation  analysis,  see  Wherry, 
1984). 


RESULTS 

Mathematically  controlled  comparison 

Response  to  availability  of  substrate  and  demand  for 
end  product 

The  responsiveness  of  each  system  to  changes  in  the  inde¬ 
pendent  concentration  variables  X0,  which  represents  the 
availability  of  initial  substrate,  and  Xn+1,  which  represents 
the  demand  for  end  product,  is  characterized  by  a  set  of 
logarithmic  gains  that  provides  a  quantitative  measure  of 
signal  propagation  through  the  system. 

The  logarithmic  gains  of  the  two  systems  in  response  to 
changes  in  the  initial  substrate  are  identical  at  each  step  in 
the  pathway  [i.e.,  L(Vv  X0)A  =  L(Vv  X0)B  and  L(Xj,  X0)A  — 
IXXV  X0)B  for  1  <  i  <  n\  because  of  the  constraints  for 
external  equivalence  described  in  the  Methods  section. 
Hence,  the  responsiveness  of  the  two  systems  to  changes  in 
the  availability  of  initial  substrate  is  identical. 

In  contrast,  the  responsiveness  of  the  two  systems  to 
changes  in  the  demand  for  their  end  product  is  different.  The 
ratio  of  the  logarithmic  gains  in  flux  is  given  by 


L(V,Xn+l)A 

L(V,Xn+l)B 

£ln 

T 


n 

n  gj+ij 

j=i 


>  i. 


(28) 


where  f  is  always  a  negative  sum  of  products  of  the  kinetic 
orders,  gln  <  0,  and  gj+ltj  >  0  for  j  =  1,  2,  . . . ,  n  —  1. 
These  results  demonstrate  that  the  flux  in  the  reference 
system  is  more  responsive  than  that  in  the  alternative  system 
to  changes  in  demand  for  end  product. 

The  ratio  of  the  logarithmic  gains  in  concentration  is 
given  by 


UX i,Xn+l)A 

L(X„Xn+l)  B 

i  =  1,2,.. 


n , 

(29) 
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where  £  is  a  sum  of  products  of  the  kinetic  orders  that 
depends  on  i  and  the  length  of  the  pathway,  gln  <  0,  and 
£j+ij  >  0  *orj  ”  1,  2,  . . . ,  n  -  1.  When  /  =  1  or  i  =  n, 
l  is  always  positive  and,  thus,  the  reference  model  is  always 
less  sensitive  to  demand.  When  1  <  i  <  n,  £  is  positive  in 
most  cases.  This  shows  that  the  concentrations  are  usually 
less  sensitive  to  demand  in  the  system  with  overall  feedback 
inhibition. 

Robustness  of  flux 

The  robustness  of  any  systemic  property  with  respect  to 
perturbations  in  the  values  of  the  parameters  that  define  the 
system  is  characterized  by  a  set  of  parameter  sensitivities. 
The  steady-state  flux  of  reference  and  alternative  systems 
has  different  sensitivities  with  respect  to  the  parameters  an, 
«n+i»  £i,n-i>  Sn+i.m  8nxv  and  g that  are  common  to  the 
two  systems.  The  sensitivities  are  the  same  with  respect  to 
all  other  parameters  common  to  the  two  systems. 

The  sensitivities  of  the  steady-state  flux  with  respect  to 
the  parameters  an,  gnn,  and  gn  n_j  exhibit  a  common  pattern. 
If  we  take  the  ratio  of  a  sensitivity  in  the  reference  system 
to  the  corresponding  sensitivity  in  the  alternative  system, 
we  find  that  the  ratio  of  the  sensitivities  is  always  less  than 
1 .  That  is. 


s(v,p)A 

S(V,p)  B 

1 +~  n  gj+u 


<  i. 


(30) 


where  y  is  a  positive  sum  of  products  of  the  kinetic  orders, 
gin  <  0,  gj+lj  >  0  for  y  =  1,2 . n  -  1,  and 


r  j=l 


(31) 


Thus,  the  flux  in  the  reference  system  is  less  sensitive  to 
parameter  variations,  i.e.,  is  more  robust  than  that  in  the 
alternative  system. 

The  sensitivities  of  the  steady-state  flux  with  respect  to 
the  parameter  gj  exhibit  a  similar  pattern.  The  ratio  of 
the  sensitivities  in  this  case  is  given  by 


where  £  is  a  negative  sum  of  products  of  the  kinetic  orders. 
These  parameter  sensitivities  are  related  to  the  last  enzyme 
and  reflect  the  design  for  responsiveness  to  changes  in 
demand  for  end  product. 

As  the  position  of  the  last  intermediate  that  provides 
feedback  inhibition  to  the  first  reaction  approaches  the 
beginning  of  the  pathway,  the  number  of  sensitivities  that 
differ  between  reference  and  alternative  systems  increases. 
This  is  so  because  the  number  of  primed  parameters  de¬ 
creases  and  a  smaller  number  of  conditions  for  external 
equivalence  are  needed  to  eliminated  the  extra  degrees  of 
freedom.  In  general,  if  the  last  intermediate  that  provides  an 
inhibitory  feedback  to  the  first  reaction  is  Xk  for  k  <  n  -  1, 
then  the  sensitivities  of  the  flux  to  the  rate  constants  ak  to 
an+1  and  those  to  the  kinetic  orders  gy  (A;  <  i  <  n  and  i  < 
j  —  n)  will  differ  between  the  reference  and  the  alternative 
systems.  In  most  cases,  the  sensitivities  will  be  less  in  the 
reference  system.  There  are  exceptions  to  this,  depending  on 
the  length  of  the  pathway  and  on  the  last  intermediate  that 
provides  feedback  inhibition  to  the  first  step,  and,  in  the 
case  of  an+,  and  gn+1  n,  the  sensitivities  of  the  reference 
system  will  always  be  greater,  for  the  reasons  we  have 
already  mentioned. 


Robustness  of  concentrations 

The  steady-state  concentrations  of  reference  and  alternative 
systems  have  different  sensitivities  with  respect  to  many 
parameters  that  define  the  systems.  In  some  cases,  the  ratio 
of  the  corresponding  sensitivities  is  always  <1  or  always 
>1,  but,  in  others,  the  ratio  is  <1  for  some  values  of  the 
parameters  and  >1  for  other  values.  In  the  latter  cases,  an 
examination  of  actual  numerical  values  for  the  parameters  is 
critical. 

The  ratio  of  sensitivities  for  the  concentration  of  each 
intermediate  in  the  pathway  with  respect  to  changes  in  the 
kinetic  order  g1?n_,  is  identical  to  that  given  in  Eq.  32. 
Similarly,  the  ratio  of  sensitivities  for  Xn  with  respect  to 
changes  in  the  rate  constants  an  or  an+]  is  always  of  the 
form 


S(V,p)A 

1 

glngn.n-1 

<  1. 

S(Xn,  ap)A 

S(V,p)  B 

1 

glngn.n—  1 

g  l,n— 1  (gnn  gn  +  l,n) 

S(X„,  ap)B 

i+7=ru..j 

j=l 


<  1  p  =  n,  n  +  1, 


Although  the  function  of  the  kinetic  orders  is  different  from 
that  in  Eq.  30,  the  flux  in  the  reference  system  is  again  less 
sensitive  to  parameter  variations,  i.e.,  is  more  robust  than 
that  in  the  alternative  system. 

In  contrast,  the  ratio  of  the  sensitivities  with  respect  to  the 
parameters  an+1  and  gn+1  n  exhibits  a  different  pattern, 


S(V,p)A 

S(V,p)  B 

1+yll  Sj+io 
b  i=! 


>  1, 


(33) 


(34) 


where  £p  is  a  different  positive  sum  of  products  of  kinetic 
orders  for  each  ap,  p  —  n,  n  +  1 ,  and 


-1 


.  gin 


y  rigj+ij<0  p  =  n,n+  1.  (35) 

6fl  j-i 


Thus,  the  reference  system  is  always  less  sensitive  to 
changes  in  these  parameters. 
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In  contrast,  the  ratio  of  sensitivities  for  Xn  with  respect  to 
changes  in  the  kinetic  orders  gn+  lt„,  g„,n-i»  or  8™  is  always 
of  the  form 


£(^n>  £pq)A 

*^(^n»  £jxj)b 

(36) 


where  Cpq  is  a  different  positive  sum  of  products  of  the 
kinetic  orders  for  each  g^.  In  this  case,  the  ratio  can  be  >  1 
or  <1.  This  means  that  the  sensitivity  of  the  reference 
system  will  be  greater  than  the  sensitivity  of  the  alternative 
system  for  some  values  of  the  parameters  and  less  for 
others.  Similarly,  the  ratio  of  sensitivities  for  each  interme¬ 
diate  X{  with  respect  to  changes  in  each  parameter  can  be 
^1,  depending  on  values  of  the  parameters. 

Again,  as  the  position  of  the  last  intermediate  that  pro¬ 
vides  feedback  inhibition  to  the  first  reaction  approaches  the 
beginning  of  the  pathway,  the  number  of  sensitivities  that 
differ  between  reference  and  alternative  systems  increases. 
In  general,  if  the  last  intermediate  that  provides  an  inhibi¬ 
tory  feedback  to  the  first  reaction  is  Xk>  then  the  ratio  of 
sensitivities  for  each  metabolite  with  respect  to  changes  in 
the  kinetic  order  glk  is  given  by 


S(X\,  8 lk) A 

S(X j,  £uc)b 

<1 


i  =  1,2, 


,  n. 
(37) 


In  this  equation,  £lk  is  a  positive  subdeterminant  of  the  [A] 
matrix.  The  ratio  of  sensitivities  for  the  end  product  with 
respect  to  changes  in  each  of  the  parameters  common  to  the 
two  systems  also  is  always  <1.  Similarly,  the  ratio  of 
sensitivities  for  the  last  intermediate  that  feeds  back  to  the 
first  reaction,  Xk,  with  respect  to  the  parameters  ak  or  gkj 
(k  <  j  <  n)  is  always  <1.  Thus,  the  reference  system  is 
always  more  robust  than  the  alternative  system  in  these 
cases.  As  for  the  remaining  cases,  the  sensitivities  of  the 
reference  system  will  be  greater  than  the  sensitivities  of  the 
alternative  system  for  some  values  of  the  parameters  and 
less  for  others. 


Stability 

The  characteristic  equation  for  Eqs.  1-3  operating  near  the 
steady  state  can  be  written  as 

,  -  A  Ftai 2  •  *  *  ^lam 

F2a2\  Fyiyi  —  A  ’  •  •  T2a2n 

0  F3a32  ‘  *  •  ^fl3n  _  Q 

0  *  ■  *  f n— l^n-l.n-1  A  /*n-I#n-l.n 

0  *  *  '  9  ^n^n.n— 1  ^n^nn  A 

(38) 

where  F,  =  Vl0/Xi0  and  an  =  gij  -  gi+Ki.  Eq.  38  can  be 
expanded  into  polynomial  form  and  the  Routh  conditions 
for  local  stability  determined.  The  last  two  Routh  conditions 


are  critical  for  stability  (Frazer  and  Duncan,  1929).  The  last 
condition  is  equivalent  to  the  condition  (— l)ndet(A)  >  0, 
which  is  always  true  for  the  systems  we  are  considering 
(Savageau,  1976,  Appendix  B). 

The  two  critical  Routh  conditions  for  a  two-step  pathway 
are 


=  Ft{gu  -  g2J)  +  F2(g22  -  g32)  <  0  (39) 

and 

R2  =  F i F ?[g 1 1 (g22  —  £32)  +  g2i(g32  “  8 12)]  >  (40) 

Both  these  conditions  are  always  satisfied  for  both  system  A 
(gi2  <  0)  and  system  B  (gj2  «  0  and  gj,  =  gn  + 
81282^(832  “  822)  <8u<  OX  so  these  systems  are  always 
stable.  The  ratio  of  the  last  Routh  condition  for  the  two 
systems  is  equal  to  unity,  whereas  that  for  the  penultimate 
condition  is  given  by 

Fia  _ _ F\g\2g2\ _ 

^1B  (^lgl2g21  ~  Figug22  +  ^l£2l£22  “  Fog22 

+  Flgng32  “  F,g2lg32  +  2F2g22gn  —  F2^yy) 
<1.  (41) 

Thus,  the  stability  margin  is  larger  for  the  alternative  sys¬ 
tem  B. 

The  two  critical  Routh  conditions  for  a  three-step  path¬ 
way  are  already  considerably  more  complex.  Whereas  the 
last  condition  is  always  positive,  the  most  critical  condition 
is  the  penultimate  one  that  can  be  positive  or  negative, 
depending  upon  the  particular  values  for  the  parameters. 
The  ratio  of  the  last  condition  for  the  two  systems  is  equal 
to  1 ;  the  ratio  of  the  penultimate  condition  can  be  >  1  or  <  1 , 
depending  on  the  values  for  the  parameters.  These  same 
conclusions  are  obtained  for  pathways  of  length  four  or 
greater:  the  ratios  cannot  be  determined  analytically  to  be 
>1  or  <  1 ,  and  we  must  resort  to  numerical  methods. 


Transient  time 

There  is  no  analytical  way  to  accurately  calculate  the  tran¬ 
sient  times  of  the  pathway.  This  must  be  done  numerically. 

Numerical  comparisons 

Unlike  the  symbolic  analysis  performed  in  the  previous 
section,  using  actual  numbers  for  the  values  of  the  param¬ 
eters  limits  the  absolute  generality  of  the  results.  However, 
it  does  allow  us  to  obtain  general  conclusions  in  a  statistical 
sense.  The  results  described  below  have  been  obtained  for 
pathways  of  up  to  seven  intermediates.  The  trends  in  these 
results  remain  constant  throughout  all  the  tested  lengths 
(i.e.,  pathways  from  2  to  7  intermediates),  which  suggests 
that  they  will  remain  so  for  longer  pathways.  The  use  of 
these  numerical  methods  allows  us  not  only  to  study  the 
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effects  of  overall  feedback  inhibition,  but  also  to  study 
correlations  that  exist  between  systemic  properties  and  the 
different  parameters  of  the  system. 

Response  to  availability  of  substrate  and  demand  for 
end  product 

The  logarithmic  gains  in  concentrations  of  the  two  systems 
in  response  to  changes  in  the  initial  substrate  X0  are  iden¬ 
tical  at  each  step  in  the  pathway  because  of  the  constraints 
for  external  equivalence  described  in  the  Methods  section. 
The  same  is  true  for  the  logarithmic  gains  in  flux.  Hence, 
the  responsiveness  of  concentrations  and  fluxes  in  the  two 
systems  to  changes  in  the  availability  of  initial  substrate  is 
identical  numerically  as  well  as  analytically. 

The  logarithmic  gain  in  flux  for  system  A  in  response  to 
changes  in  the  demand  for  end  product  was  shown  analyt¬ 
ically  to  be  greater  than  that  for  system  B.  The  graph  of 
WXn+1)A/I(VJn+1)B  versus  L(V,  Xn+1)A  (Fig-  2  A), 
which  is  the  moving  median  density  of  ratios  plot  intro¬ 
duced  in  Alves  and  Savageau  (2000c),  shows  how  much 
greater,  on  average,  the  response  is  for  system  A.  It  also 
shows  a  negative  correlation  between  the  ratio  of  responses 
and  the  response  of  the  reference  system.  This  means  that, 
as  L(K  Xn+I)A  increases,  the  ratio  L(V,  Xn+1)A/L(V,  Xn+J)B 
tends  to  decrease. 

The  logarithmic  gain  in  end-product  concentration  for 
system  A  in  response  to  changes  in  the  demand  for  end 
product  also  was  shown  analytically  to  be  smaller  than  that 
for  system  B.  The  graph  of  L(ATn,  Xu+x)A!L(Xn,  Xn+1)B  ver¬ 
sus  L(Xn,Xn+])A  (Fig.  2  B)  shows  how  much  smaller,  on 
average,  the  response  is  for  system  A.  It  also  shows  a 
positive  correlation  between  the  ratio  of  responses  and  the 
response  of  the  reference  system. 

Robustness 

Figure  2  shows  typical  moving  median  density  of  ratios 
plots  for  the  aggregate  parameter  sensitivities  of  flux  and 
concentrations.  The  aggregate  parameter  sensitivity  of  the 
flux  V  is  smaller,  on  average,  for  system  A  (Fig.  2  C). 
Assume  that  Xk  is  the  last  intermediate  to  feed  back  on  the 
first  reaction  of  the  pathway.  The  aggregate  parameter  sen¬ 
sitivity  ofXk  is  smaller,  on  average,  for  system  B  (Fig.  2  D). 
The  average  difference  in  aggregate  sensitivities  for  this 
metabolite  is  never  larger  than  a  few  percent.  With  regard  to 
the  remaining  intermediates,  the  graphs  for  Xx  (Fig.  2  E)  and 
Xj  (Fig.  2  F)  represent  typical  plots  of  aggregate  parameter 
sensitivities.  In  these  cases,  we  find  that  random  reference 
systems  are  less  sensitive  than  the  equivalent  alternative 
systems.  The  average  differences  can  range  from  a  few 
percent  to  fifty  or  more  percent.  The  individual  parameter 
sensitivities  of  Xn  were  analytically  determined  to  be 
smaller  in  system  A.  In  the  example  presented  here,  the 
difference  is,  on  average,  just  a  few  percent  (Fig.  2  G); 


however,  depending  on  the  length  of  the  pathway,  this 
difference  can  increase  to  more  significant  values. 

The  flux  (Fig.  2  C)  and  concentrations  Xv  i  <  n,  (Fig.  2, 
D,  E ,  and  F)  show  a  positive  correlation  between  the  ratio  of 
their  aggregate  sensitivities  in  the  two  systems  and  the 
aggregate  sensitivity  in  the  reference  system  when  its  value 
is  low.  For  systems  with  low  sensitivities,  system  A  is,  on 
average,  much  less  sensitive  than  system  B.  For  higher 
values  of  the  aggregate  sensitivities  in  the  reference  system, 
there  is  no  correlation.  In  the  case  of  Xk,  the  ratio  is  fairly 
independent  of  the  values  of  the  aggregate  sensitivity  in  the 
reference  system. 

Stability 

The  last  critical  Routh  criterion  is  always  the  same  in  the 
reference  and  alternative  systems,  as  has  been  shown  ana¬ 
lytically.  For  a  two-step  pathway,  the  margin  of  stability 
determined  by  the  penultimate  criterion  is  always  larger  in 
system  B.  For  longer  pathways,  the  margin  of  stability  can 
be  larger  in  either  the  reference  or  the  alternative  system, 
depending  on  the  numerical  values  of  the  parameters.  The 
differences  between  the  two  systems  with  respect  to  this 
penultimate  criterion  are  small  (on  average  less  than  2%, 
Fig.  2  H ),  which  implies  that  systems  with  and  without  overall 
feedback  inhibition  will  have  comparable  stability  margins. 

Transient  time 

Fig.  2 1  shows  a  typical  moving  median  density  of  ratio  plot 
for  transient  time.  This  plot  shows  that  the  reference  system 
usually  responds  to  perturbations  in  the  steady  state  more 
quickly  than  the  alternative  system.  For  reference  systems 
with  a  fast  response  to  changes,  the  transient  times  can  be, 
on  average,  half  that  of  the  corresponding  alternative  sys¬ 
tems.  For  reference  systems  that  are  sluggish,  the  difference 
is,  on  average,  smaller,  though  it  still  exists. 

Effects  of  parameter  values  on 
systemic  properties 

Rate-constant  effects  on  aggregate  sensitivities 

Assume  that  Xk  is  the  last  intermediate  to  feed  back  on  the 
first  reaction.  Plotting  the  aggregate  sensitivities  as  a  func¬ 
tion  of  n  ^  y,  shows  that  there  is  a  correlation  between 
each  rate  constant  aj  and  each  of  the  aggregate  sensitivities 
(Fig.  3  A).  For  small  ot-v  the  correlation  is  either  nonexistent 
or  slightly  negative,  whereas,  for  large  values,  this  correla¬ 
tion  is  positive.  As  for  the  other  rate  constants,  with  j  <  n , 
there  are  no  obvious  correlations  that  are  general  for  all  the 
pathway  lengths  studied,  although,  for  some  lengths,  spe¬ 
cific  correlations  are  observed. 

Kinetic-order  effects  on  aggregate  sensitivities 

For  Xn,  the  aggregate  sensitivity  is  correlated  with  several 
parameters.  There  is  a  positive  correlation  between  this 


Biophysical  Journal  79(5)  2290-2304 


2298 


Alves  and  Savageau 


FIGURE  2  Typical  moving  median  density  of  ratios  plots  for  different  magnitudes.  The  values  on  the  X-axis  represent  the  moving  median  of  the  relevant 
magnitude  in  the  reference  system.  The  values  on  the  T-axis  represent  the  moving  median  of  the  ratio  of  that  magnitude  in  the  reference  system  to  the 
corresponding  magnitude  in  the  alternative  system.  (A)  Logarithmic  gain  in  flux  in  response  to  changes  in  demand  for  the  end  product,  L(V,Xn+l).  ( B ) 
Logarithmic  gain  in  end-product  concentration  in  response  to  changes  in  demand  for  the  end  product,  L(Xn,  Xn+ !).  (C)  Aggregate  sensitivity  of  the  pathway 
flux,  S(V).  (£>)  Aggregate  sensitivity  of  the  concentration  of  the  last  intermediate  to  feed  back  on  the  first  reaction,  S(Xk).  (E)  Aggregate  sensitivity  of  the 
concentration  of  any  intermediate  in  the  pathway  before  Xk,  S(X-X  (F)  Aggregate  sensitivity  of  the  concentration  of  any  intermediate  in  the  pathway  after 
Xk,  S(X  ).  (G)  Aggregate  sensitivity  of  the  concentration  of  the  end-product,  S(Xn).  ( H)  The  penultimate  (i.e.,  n  —  1st)  Routh  criterion;  this  represents  the 
margin  W  stability.  (/)  Transient  time,  r  in  normalized  units,  is  the  time  the  pathway  takes  to  return  within  1%  of  its  steady  state  following  a  15% 
perturbation  in  the  steady-state  values.  Each  of  these  plots  is  for  a  specific  pathway  length;  only  the  parameter  values  are  changed  randomly.  However, 
because  the  trends  observed  for  different  pathway  lengths  are  the  same,  we  have  only  shown  a  representative  case. 


sensitivity  and  gln.  Because  gln  is  always  negative,  this 
means  that  the  aggregate  sensitivity  of  Xn,  £(Xn),  is  usually 
smaller  for  high  values  of  overall  feedback  inhibition.  The 
same  is  true  for  the  correlation  between  S(X„)  and  gin  when 
i  <  n  (Fig.  3  B).  If  i  =  n,  there  is  a  negative  correlation 


between  this  aggregate  sensitivity  and  gln.  The  correlation 
of  the  aggregate  sensitivities  of  the  other  intermediates  with 
gln  is  usually  small  or  nonexistent.  There  is  a  negative 
correlation  between  the  aggregate  sensitivity  of  Xx  and  gi+li j 
or  gn>n-i  (Fig.  3  C)  and  a  positive  correlation  between  that 
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FIGURE  3  Typical  moving  median  correlation  plots  between  different  systemic  properties  and  different  kinetic  parameters  of  the  reference  system.  The 
values  on  the  X-axis  represent  the  moving  median  of  the  relevant  kinetic  parameter.  The  values  on  the  X-axis  represent  the  moving  median  of  the  relevant 
systemic  property.  (A)  Aggregate  sensitivity  of  the  concentration  of  any  pathway  intermediate  Xt  versus  the  rate-constant  parameters  an  or  an+1.  (£) 
Aggregate  sensitivity  of  the  concentration  of  the  end  product  Xn  versus  the  kinetic-order  parameters  gin  or  gi+1  n.  (C)  Aggregate  sensitivity  of  the 
concentration  of  any  pathway  intermediate  X}  versus  the  kinetic-order  parameters  g1  +  u  or  (D)  Aggregate  sensitivity  of  the  concentration  of  any 

pathway  intermediate  X-,  versus  the  kinetic-order  parameter  g ^  (£)  Aggregate  sensitivity  of  the  concentration  of  the  end  product  Xn  versus  the  kinetic-order 
parameter  {F)  Aggregate  sensitivity  of  the  pathway  flux  V  versus  the  kinetic-order  parameter  gn+1  n.  (C)  Aggregate  sensitivity  of  the  pathway  flux 

V  versus  the  kinetic-order  parameter  ( H)  Transient  time  r  versus  the  kinetic-order  parameter  gu.  (I)  Transient  time  r  versus  the  kinetic-order 

parameter  gj+u.  Each  of  these  plots  is  for  a  specific  pathway  length;  only  the  parameter  values  are  changed  randomly.  However,  because  the  trends 
observed  for  different  pathway  lengths  are  the  same,  we  have  only  shown  a  representative  case. 

of  Xt  and  gti  (Fig.  3  D).  Also,  the  aggregate  sensitivity  of  dividual  correlations  can  be  found  for  specific  intermediates 

each  X  is  negatively  correlated  with  gn+ Un  (Fig.  3  E).  These  and  specific  pathway  lengths. 

are  the  correlations  that  are  generally  observed  for  the  The  correlations  between  aggregate  sensitivities  of  flux 
aggregate  sensitivities  of  concentrations,  although  other  in-  and  the  various  kinetic-order  parameters  are  less  clear.  The 
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correlation  with  gn+Un  is  positive  for  low  values  of  gn+1>n, 
but  it  disappears  as  the  value  of  gn+1>n  increases  (Fig.  3  F). 
The  only  other  general  correlation  observed  is  that  between 
the  aggregate  sensitivity  of  the  flux  and  the  kinetic  order 
gnn_v  This  is  a  negative  correlation  that  also  vanishes  as 
the  value  of  gnn_|  increases.  This  can  be  seen  in  Fig.  3  G. 


Rate-constant  and  kinetic-order  effects  on 
margin  of  stability 

The  correlations  between  a  given  Routh  criterion  and  the 
various  parameters  depends  on  which  criterion  is  consid¬ 
ered.  The  results  are  pathway  length-specific,  and  no  gen¬ 
eral  trend  can  be  found. 


Rate-constant  and  kinetic-order  effects  on  transient  time 

There  is  no  clear  correlation  between  transient  time  and  the 
various  rate  constants.  There  are,  however,  positive  corre¬ 
lations  between  transient  time  and  the  kinetic  orders  gn,  for 
i  >:  1  (Fig.  3  H).  There  also  are  negative  correlations  be¬ 
tween  transient  time  and  the  kinetic  orders  gi+u  for  i  >  1 
(Fig.  3  /).  These  were  the  only  observed  correlations  with 
transient  time. 


Effects  of  enzyme  levels  on  systemic  variables 

We  have  determined  the  logarithmic  gains  in  flux  and 
concentrations  in  response  to  changes  in  the  level  of  indi¬ 
vidual  enzymes.  When  comparing  logarithmic  gains  in  flux 
and  concentrations  in  the  reference  and  alternative  systems, 
the  equivalence  conditions  will  make  all  corresponding 
coefficients  identical  except  the  last  two.  We  also  have 
examined  the  correlations  among  the  logarithmic  gains. 

The  last  two  logarithmic  gains  in  concentrations  are,  on 
average,  lower  in  the  system  controlled  by  overall  feedback 
inhibition  (see  also  Eq.  34).  However,  there  is  no  general 
pattern  of  correlation  among  the  logarithmic  gains  in 
concentrations. 

The  penultimate  logarithmic  gain  in  flux  is  always  larger 
in  the  alternative  system  (Fig.  4  C).  The  last  logarithmic 
gains  in  flux,  which  is  a  measure  of  coupling  between  flux 
and  the  demand  for  final  product,  is  always  larger  in  the 
reference  system  (Fig.  4  D).  The  logarithmic  gains  in  flux 
with  respect  to  changes  in  each  individual  enzyme  except 
the  last  are  directly  correlated  (Fig.  4  A,  B,  and  C).  The  last 
logarithmic  gain  in  flux  is  inversely  correlated  with  all  the 
others  (Fig.  4  D).  This  is  a  well-known  effect  of  feedback 
inhibition,  i.e.,  it  decreases  the  sensitivity  of  the  flux 
through  the  system  to  parameters  (in  this  case  enzyme 
levels)  inside  the  feedback  loop  while  increasing  the  sensi¬ 
tivity  to  parameters  outside  the  loop. 


DISCUSSION 

In  this  paper,  we  are  addressing  a  generic  property  charac¬ 
teristic  of  an  entire  class  of  biochemical  systems:  Why  is  the 
pattern  of  overall  feedback  inhibition  in  unbranched  biosyn¬ 
thetic  pathways  so  prevalent?  Because  there  are  innumera¬ 
ble  specific  cases  that  could  be  examined,  most  of  which 
have  never  arisen  or  may  no  longer  exist  because  of  natural 
selection,  one  could  never  hope  to  answer  this  type  of 
question  with  an  experimental  approach.  However,  on  a 
more  fundamental  level  (beyond  the  sheer  number  of  pos¬ 
sibilities  that  would  have  to  be  constructed  and  examined), 
one  must  face  the  difficulty  of  performing  even  a  single 
experimental  comparison  under  well-controlled  conditions 
so  that  the  results  will  not  be  confused  by  extraneous 
differences. 

The  method  of  mathematically  controlled  comparison 
was  developed  specifically  to  address  these  issues.  It  allows 
one  to  examine  enormous  numbers  of  alternatives  in  paral¬ 
lel,  more  than  would  ever  be  possible  by  experimental 
means;  it  also  allows  essentially  ideal  controlled  compari¬ 
sons,  comparisons  that  could  only  be  done  with  an  enor¬ 
mous  experimental  effort.  In  short,  this  is  the  type  of  ques¬ 
tion  that  is  more  appropriately  answered  by  means  of  a 
theoretical  analysis  than  by  the  accumulation  of  experimen¬ 
tal  evidence  for  one  specific  system  after  another. 

The  experimental  difficulty  in  doing  the  equivalent  of  a 
mathematically  controlled  comparison  can  be  seen  from  the 
expressions  in  the  Appendix.  One  would  first  have  to  gen¬ 
erate  a  large  number  of  feedback-resistant  mutants.  Each 
independent  mutant  would,  in  general,  have  different  values 
for  the  resulting  K'u  and  V'm  parameters.  One  would  have  to 
measure  the  K m  for  each  of  the  mutants  until  one  was  found 
that  had  the  appropriate  value,  as  determined  by  the  con¬ 
straints  for  external  equivalence  in  Eqs.  A4-A8.  If  one  was 
lucky  enough  to  find  that  this  mutant  also  had  the  correct 
value  for  V’m,  as  determined  by  the  constraints  for  external 
equivalence  in  Eqs.  A4— A8,  then  one  could  measure  the 
systemic  differences  between  the  wild-type  and  mutant  to 
experimentally  verify  the  theoretical  results.  If  the  value 
was  not  appropriate,  one  might  construct  a  mutant  strain 
with  the  structural  gene  for  the  first  enzyme  under  the 
control  of  a  promoter  whose  activity  can  be  independently 
varied.  In  such  a  construct,  one  might  be  able  to  adjust  the 
promoter  activity  to  provide  the  appropriate  value  for  V’m . 
Again,  one  could  measure  the  systemic  differences  between 
the  wild-type  and  mutant  to  experimentally  verify  the  the¬ 
oretical  results.  As  can  be  seen  from  this  discussion  of  what 
it  would  take  to  do  the  experiments  properly,  it  is  unlikely 
that  anyone  would  undertake  the  task.  This  is  especially  so 
when  the  result  will  only  be  valid  for  one  special  system, 
and  will  not  contribute  significantly  to  the  validation  of  the 
general  principle. 

This  discussion  is  in  no  way  a  criticism  of  the  experi¬ 
mental  approach.  It  simply  acknowledges  the  fact  that  only 
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FIGURE  4  Typical  moving  median  correlation  plots  between  different  logarithmic  gains  in  flux  with  respect  to  changes  in  individual  enzyme  levels.  The 
values  on  the  X-axis  represent  the  moving  median  of  the  logarithmic  gain  with  respect  to  the  first  enzyme  of  a  pathway.  The  values  on  the  Y- axis  represent 
the  moving  median  of  the  logarithmic  gains  with  respect  to  subsequent  enzymes  in  the  pathway.  Full  lines  indicate  curves  for  the  reference  system,  and 
dashed  lines  indicate  curves  for  the  alternative  system.  (A)  Logarithmic  gain  in  flux  with  respect  to  the  second  enzyme  of  the  pathway  versus  logarithmic 
gain  in  flux  with  respect  to  the  first  enzyme  of  the  pathway.  (B)  Logarithmic  gain  in  flux  with  respect  to  the  ith  enzyme  of  the  pathway  (/  *  1,  2,  «,  n  + 
1)  versus  logarithmic  gain  in  flux  with  respect  to  the  first  enzyme  of  the  pathway.  (C)  Logarithmic  gain  in  flux  with  respect  to  the  penultimate  enzyme 
of  the  pathway  versus  logarithmic  gain  in  flux  with  respect  to  the  first  enzyme  of  the  pathway.  ( D )  Logarithmic  gain  in  flux  with  respect  to  the  last  enzyme 
of  the  pathway  versus  logarithmic  gain  in  flux  with  respect  to  the  first  enzyme  of  the  pathway.  Each  of  these  plots  is  for  a  specific  pathway  length;  only 
the  parameter  values  are  changed  randomly.  However,  because  the  trends  observed  for  different  pathway  lengths  are  the  same,  we  have  only  shown  a 
representative  case. 


specific  theoretical  predictions  are  amenable  to  direct  ex¬ 
perimental  test.  More  general  theoretical  predictions  that 
apply  to  an  entire  class  of  systems  require  experimental 
information  for  many  members  of  the  class.  The  experimen¬ 
tal  validation  of  the  theory  presented  here  is  the  fact  that  it 
can  account  for  the  prevalence  of  overall  feedback  inhibi¬ 
tion  in  biosynthetic  pathways. 

In  this  work,  we  have  used  a  numerical  generalization  of 
the  method  of  mathematical  controlled  comparison  to  ex¬ 
amine  systemic  properties  of  models  with  and  without  over¬ 
all  feedback  inhibition  in  unbranched  pathways  that  other¬ 
wise  have  an  arbitrary  pattern  of  feedback  inhibitions.  In 


summarizing  our  findings,  we  shall  interlace  the  results  of 
the  older  analytical  approach  with  those  of  the  more  re¬ 
cently  developed  numerical  approach.  This  has  the  advan¬ 
tage  of  showing  how  the  numerical  approach  goes  beyond 
the  analytical  approach  to  broaden  the  scope  of  mathemat¬ 
ical  controlled  comparison. 

By  using  mathematically  controlled  comparisons,  we 
have  ensured  that  the  systems  achieve  the  same  steady-state 
flux,  metabolite  concentrations,  and  logarithmic  gains  with 
respect  to  changes  in  the  concentration  of  initial  substrate, 
whether  overall  feedback  inhibition  is  present  or  not.  How¬ 
ever,  the  alternative  designs  exhibit  differences  for  many 
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other  systemic  properties.  In  the  following  seven  types  of 
results,  the  analytical  approach  yields  unambiguous  quali¬ 
tative  differences. 

1 .  The  logarithmic  gain  in  flux  resulting  from  an  increase  in 
demand  for  end  product  is  always  greater  in  the  system 
with  overall  feedback  inhibition.  This  ensures  a  tighter 
control  of  the  material  flowing  through  the  pathway  by 
the  demand  for  such  material. 

2.  The  logarithmic  gain  in  the  concentration  of  the  first  and 
last  metabolite  resulting  from  an  increase  in  demand  for 
end  product  is  always  less  in  the  system  with  overall 
feedback  inhibition.  This  shows  that  these  concentrations 
tend  to  be  buffered  against  changes  in  demand  for  end 
product. 

3.  The  sensitivities  of  the  flux  to  changes  in  the  parameters 
of  the  intermediate  reactions  for  the  system  with  overall 
feedback  inhibition  are  less  than  or  equal  to  those  of  the 
otherwise  equivalent  system  without  this  inhibition.  This 
shows  that  overall  feedback  inhibition  increases  the  ro¬ 
bustness  of  the  flux. 

4.  The  sensitivities  of  the  flux  to  changes  in  the  parameters 
of  the  last  reaction  for  the  system  with  overall  feedback 
inhibition  are  greater  than  or  equal  to  those  of  the  oth¬ 
erwise  equivalent  system  without  this  inhibition.  This  is 
related  to  the  first  point  above. 

5.  The  sensitivity  of  the  end-product  concentration  to  each 
rate-constant  parameter  of  the  system  with  overall  feed¬ 
back  inhibition  is  always  less  than  or  equal  to  that  of  the 
otherwise  equivalent  system  without  this  mechanism. 
This  was  shown  to  be  analytically  true  independent  of 
pathway  length.  The  reference  system  is  thus  more  ef¬ 
fective  in  buffering  the  final  product  of  the  pathway 
against  parameter  fluctuations. 

6.  The  sensitivity  of  each  concentration  to  the  parameter 
representing  the  last  intermediate  to  feed  back  on  the  first 
reaction  is  always  less  in  the  system  with  overall  feed¬ 
back  inhibition.  Again,  the  reference  system  is  better 
protected  against  fluctuations  of  this  parameter. 

7.  For  the  special  case  of  pathways  with  two  intermediates, 
the  alternative  system  has  larger  stability  margins  than 
the  reference  system  with  overall  feedback  inhibition. 
The  more  general  case  is  discussed  below. 

From  the  above  results,  we  conclude  that  pathway  flux  is 
more  responsive  to  change  in  demand  for  the  end  product 
when  overall  feedback  inhibition  is  present  and  that  the 
concentration  of  final  product,  and  the  magnitude  of  path¬ 
way  flux,  is  less  sensitive  to  changes  in  the  parameters  of 
the  system  with  overall  feedback  inhibition. 

In  each  of  the  above  results,  the  numerical  method  not 
only  confirmed  the  qualitative  differences,  but  also  showed 
how  large  the  differences  were  on  average.  In  the  following 
four  types  of  results  the  analytical  approach  yields  either  no 
results  or  ambiguous  qualitative  differences,  whereas  the 


numerical  approach  gives  statistical  regularities  in  either 
situation. 

1 .  The  logarithmic  gain  in  the  concentration  of  intermedi¬ 
ates  X2  to  Xn_,  resulting  from  an  increase  in  demand  for 
end  product  may  be  either  larger  or  smaller  in  the  refer¬ 
ence  system  depending  on  the  intermediate,  the  pathway 
length,  or  the  values  of  the  parameters.  The  numerical 
results  show  that,  on  average,  these  logarithmic  gains  are 
smaller  in  the  reference  system. 

2.  For  all  concentrations,  there  are  some  sensitivities  that 
may  be  either  larger  or  smaller  in  the  reference  system. 
The  numerical  approach  shows  that,  on  average,  these 
concentrations  have  smaller  aggregate  sensitivities  in  the 
reference  system.  The  differences  between  the  reference 
system  and  the  alternative  system  can  range  anywhere 
between  a  few  percent  to  fifty  percent  or  more,  depend¬ 
ing  on  the  length  of  the  pathway  and  the  concentration  of 
interest. 

3.  The  stability  margins  for  pathways  longer  than  two  re¬ 
actions  can  be  larger  in  either  the  reference  system  or  the 
alternative  system,  depending  on  the  values  of  the  pa¬ 
rameters.  Use  of  the  statistical  methodology  shows  that, 
on  average,  overall  feedback  inhibition  decreases  the 
margin  of  stability.  However,  the  differences  between 
systems  with  and  without  overall  feedback  inhibition  are, 
on  average,  less  than  3%  and  typically  less  than  5%. 

4.  The  transient  time  of  the  pathways  cannot  be  determined 
analytically.  Numerical  results  show  that  transient  times 
tend  to  be  smaller  in  pathways  with  overall  feedback 
inhibition.  Although  a  small  percentage  of  systems  with 
overall  feedback  inhibition  have  higher  transient  times, 
on  average,  overall  feedback  inhibition  decreases  tran¬ 
sient  times  in  stable  systems.  Systems  with  overall  feed¬ 
back  inhibition  can  be,  on  average,  a  few  percent  faster 
to  twice  as  fast  as  systems  without  overall  feedback 
inhibition,  depending  on  the  length  of  the  pathway. 

In  addition  to  resolving  ambiguities  in  the  analytical 
comparisons,  the  numerical  methods  allowed  us  to  identify 
some  general  effects  of  parameter  values  on  systemic  prop¬ 
erties.  We  found  that  there  is  a  correlation  between  the 
values  of  ay  (j  —  n,  n  +  1 )  and  the  values  of  the  aggregate 
sensitivities  for  each  metabolite  as  well  as  the  flux.  For  very 
low  values  of  aj,  the  aggregate  sensitivities  will  not  be 
strongly  affected  by  a  change  in  those  parameters.  As  these 
parameters  becomes  larger  than  1,  a  correlation  develops. 
As  the  value  of  aj  increases,  so  does  the  aggregate  sensi¬ 
tivity  on  average.  The  rate  constant  an+1  is  a  parameter  that 
can  be  interpreted  as  the  demand  for  Xn.  This  means  that,  as 
the  demand  increases,  so  do  the  aggregated  sensitivities. 
Why  this  happens  is  not  clear. 

General  correlations  between  systemic  properties  and  ki¬ 
netic-order  parameters  also  were  identified.  For  example, 
we  found  that  the  transient  times  of  the  pathway  are  in¬ 
versely  correlated  with  the  kinetic  orders  gi  +  ij.  This  means 
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that,  on  average,  a  system  will  respond  faster  to  perturba¬ 
tions  if  the  kinetic  orders  for  the  substrates  of  the  reactions 
are  higher.  The  perturbations  that  were  given  to  the  systems 
were  always  positive,  i.e.,  the  substrates  were  increased 
above  their  nominal  steady-state  values.  Higher  kinetic  or¬ 
ders  with  respect  to  substrate  mean  that  the  rate  will  have  a 
sharper  response  to  an  increase  in  the  substrate,  thus  causing 
it  to  return  to  the  steady-state  value  faster.  In  addition  to 
this,  there  is  a  positive  correlation  between  transient  times 
and  feedback  parameters.  Lower  magnitudes  for  the  kinetic 
orders  representing  inhibitory  feedback  make  the  rate  less 
sensitive  to  increases  in  the  concentrations  of  its  inhibitors. 
Thus,  after  an  increase  in  inhibitor  concentrations,  systems 
with  lower  magnitudes  for  the  feedback  interaction  will 
have  faster  rates  than  systems  with  high  magnitudes.  It  is 
not  clear  why  these  correlations  exist  only  with  respect  to 
the  parameters  representing  feedback  to  the  first  reaction  of 
the  pathway. 

In  conclusion,  it  is  important  to  note  that  the  results 
presented  here  are  also  valid  for  simpler  patterns  of  feed¬ 
back  inhibition,  i.e.,  those  that  are  not  “fully-wired.”  If  a 
pathway  with  a  smaller  number  of  internal  feedback  inter¬ 
actions  is  considered,  the  qualitative  results  remain  the 
same.  To  be  more  specific,  the  number  of  sensitivities  that 
are  different  between  pathways  with  and  without  overall 
feedback  inhibition  may  be  smaller  for  pathways  with  less 
internal  wiring,  but  the  ones  that  are  different  remain  larger 
or  smaller  in  the  same  model  as  in  the  fully-wired  compar¬ 
ison.  This  demonstrates  the  generality  of  the  fully-wired 
case  and  the  results  provide  a  rationale  for  the  widespread 
occurrence  of  overall  feedback  inhibition  in  nature. 


APPENDIX 


and  that  a  mutation-eliminating  inhibition  by  the  end  product  results  in  the 
following  rate  law  for  the  alternative  system: 


V,  = 


V'X 

*6  +  *m2' 


(A3) 


In  general,  the  KM  and  Vm  values  will  be  different  in  Eqs.  A2  and  A3, 
hence  primes  are  used  to  indicate  that  the  values  will  be  different  in  the  two 
systems. 

If  one  now  generates  the  conditions  for  external  equivalence,  one 
obtains  the  following  constraint  relationships  after  some  differentiation  and 
algebraic  manipulation: 
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Note  that  Xn0  in  these  expressions  has  a  single  positive  real  solution  given 
by 


X„a  =  A  +  B, 


(A6) 


where 


A  = 
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and 


One  could  address  the  generic  questions  in  this  paper  because  the  power- 
law  formalism  is  systematically  structured  and  is  thereby  able  to  represent 
systems  with  essentially  any  type  of  mechanism,  i.e.,  the  representation  is 
mechanism  independent.  This  is  in  contrast  to  the  Michaelis-Menten 
formalism,  which  does  not  have  a  well-defined  structure  [see  Savageau 
(1996)].  One  cannot  address  the  generic  questions  examined  in  this  paper 
if  one  insists  on  using  the  Michaelis-Menten  formalism.  The  following  is 
an  example  illustrating  why  this  is  the  case. 

Consider  a  special  case  in  which  one  happens  to  know  the  specific 
mechanisms  for  each  reaction  in  the  pathway.  For  example,  assume  that  all 
the  reactions  in  common  are  governed  by  simple  irreversible  Michaelis- 
Menten  kinetics,  in  particular,  that  the  rate  law  for  the  degradation  of  the 
end  product  Xn  is  given  by 


+  Xn ' 


(Al) 


B  = 


VJ&Cu  X 
2  VmX 


If  this  solution  is  inserted  into  the  constraint  expressions  for  *M  and  V^, 
one  sees  that  they  become  even  more  complex. 

These  are  among  the  simplest  of  assumptions  regarding  the  Michaelis- 
Menten  formalism,  and  one  can  see  how  much  more  complicated  this 
approach  is  compared  to  the  approach  in  the  power-law  formalism  [con¬ 
trast  Eqs.  A4-A8  with  Eqs.  26  and  27  in  the  text].  The  above  expressions 
would  be  different  for  different  mechanisms,  and,  when  the  mechanisms 
are  more  complex,  the  process  would  become  quite  impractical.  Yet,  one 
obtains  the  same  results  for  the  local  behavior. 


Further  assume  that  the  first  enzyme  has  a  specific  cooperative  mechanism 
with  the  rate  law. 


vmxl 

Vl  xJ  +  XO  +  GSraf))’ 


(A2) 
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ABSTRACT  It  has  been  observed  experimentally  that  most  unbranched  biosynthetic  pathways  have  irreversible  reactions 
near  their  beginning,  many  times  at  the  first  step.  If  there  were  no  functional  reasons  for  this  fact,  then  one  would  expect 
irreversible  reactions  to  be  equally  distributed  among  all  positions  in  such  pathways.  Since  this  is  not  the  case,  we  have 
attempted  to  identify  functional  consequences  of  having  an  irreversible  reaction  early  in  the  pathway.  We  systematically 
varied  the  position  of  the  irreversible  reaction  in  model  pathways  and  compared  the  resulting  systemic  behavior  according 
to  several  criteria  for  functional  effectiveness,  using  the  method  of  mathematically  controlled  comparisons.  This  technique 
minimizes  extraneous  differences  in  systemic  behavior  and  identifies  those  that  are  fundamental.  Our  results  show  that  a 
pathway  with  an  irreversible  reaction  located  at  the  first  step,  and  with  all  other  reactions  reversible,  is  on  average  better  than 
an  otherwise  equivalent  pathway  with  all  reactions  reversible,  which  in  turn  is  on  average  better  than  an  otherwise  equivalent 
pathway  with  an  irreversible  reaction  located  at  any  step  other  than  the  first.  Pathways  with  an  irreversible  first  reaction  and 
low  concentrations  of  intermediates  (one  of  the  primary  criteria  for  functional  effectiveness)  exhibit  the  following  profile  when 
compared  to  fully  reversible  pathways:  changes  in  the  concentration  of  intermediates  in  response  to  changes  in  the  level  of 
initial  substrate  are  equally  low,  the  robustness  of  the  intermediate  concentrations  and  of  the  flux  is  similar,  the  margins  of 
stability  are  similar,  flux  is  more  responsive  to  changes  in  demand  for  end  product,  intermediate  concentrations  are  less 
responsive  to  changes  in  demand  for  end  product,  and  transient  times  are  shorter.  These  results  provide  a  functional  rationale 
for  the  positioning  of  irreversible  reactions  at  the  beginning  of  unbranched  biosynthetic  pathways. 


INTRODUCTION 

Several  types  of  theoretical  studies  have  reported  properties 
of  enzymes  that  could  account  for  their  selection  during  the 
evolution  of  metabolic  pathways.  The  simplest  type  in¬ 
volves  determining  the  distribution  of  parameter  values  that 
produces  the  maximal  catalytic  efficiency  of  an  isolated 
enzyme  (Fersht,  1974;  Crowley,  1975;  Albery  and 
Knowles,  1976;  Comish-Bowden,  1976;  Mavrovouniotis  et 
al.,  1990;  Heinrich  and  Hoffman,  1991;  Peterson,  1992; 
1 996;  Wilhelm  etal.,  1994;  Bish  and  Mavrovouniotis,  1998; 
Heinrich  and  Schuster,  1998).  Waley  (1964)  considered  a 
three-step  pathway  with  reactions  described  by  Michaelis- 
Menten  rate  laws  and  determined  the  distribution  of  enzyme 
concentrations  that  maximizes  flux  through  the  pathway. 
Similar  studies  were  performed  for  n-step  pathways  (Schus¬ 
ter  and  Heinrich,  1987;  Klipp  and  Heinrich,  1994;  Heinrich 
and  Klipp,  1996).  Other  theoretical  studies  have  dealt  with  the 
design  of  regulatory  patterns  that,  according  to  multiple  crite¬ 
ria,  optimize  the  local  behavior  of  unbranched  biosynthetic 
pathways  with  n  steps  and  arbitrary  mechanisms  (Savageau, 
1972,  1974,  1975,  1976;  Savageau  and  Jacknow,  1979). 
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An  aspect  that  has  been  less  thoroughly  studied  is  the 
distribution  of  irreversible  reactions  in  unbranched  biosyn¬ 
thetic  pathways  and  how  this  distribution  might  be  related  to 
the  optimization  of  various  systemic  properties.  Although 
each  reaction  is  in  principle  reversible,  in  practice  some 
reactions  in  a  pathway  operate  far  from  thermodynamic 
equilibrium  and  are  effectively  irreversible.  It  has  been 
observed  experimentally  that,  in  most  cases,  unbranched 
biosynthetic  pathways  have  irreversible  reactions  near  the 
beginning,  many  times  at  the  first  step,  of  the  pathway  (see, 
e.g.,  EMP:http://wit.mcs.anl.gov//EMP/). 

If  there  were  no  functional  reasons  for  irreversible  reac¬ 
tions  to  be  at  the  beginning  of  a  pathway,  then  one  would 
expect  irreversible  reactions  to  be  equally  distributed  among 
all  positions  in  the  pathway.  Since  this  is  not  the  case,  we 
have  attempted  to  identify  the  functional  consequences  of 
having  an  irreversible  reaction  early  in  the  pathway.  We 
systematically  varied  the  position  of  the  irreversible  reac¬ 
tion  in  model  pathways  and  compared  the  resulting  systemic 
behavior  according  to  several  criteria  for  functional  effec¬ 
tiveness.  The  model  pathways  were  represented  by  a  power- 
law  formalism  that  faithfully  captures  their  nonlinear  be¬ 
havior,  independent  of  mechanistic  detail,  within  a  local 
neighborhood  of  an  arbitrary  steady-state  operating  point. 
We  used  the  method  of  mathematically  controlled  compar¬ 
ison  to  minimize  extraneous  differences  and  to  identify 
fundamental  differences.  With  this  approach,  we  have  been 
able  to  find  a  rationale  for  irreversible  reactions  at  the 
beginning  of  unbranched  biosynthetic  pathways. 
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METHODS 

Alternative  models  and  their  systemic  description 


Consider  the  unbranched  biosynthetic  pathways  depicted  in  Fig.  1.  The 
initial  substrate  Xq  is  an  independent  variable  with  fixed  value.  The 
independent  variable  Xn+1  represents  the  cell’s  demand  for  the  end  product 
Xn.  If  the  cell  requires  large  amounts  of  Xn,  then  the  value  of  Xn+)  will  be 
high;  if  small  amounts  of  Xn  are  required,  then  the  value  of  X„+ ,  will  be  low. 
The  end  product  inhibits  the  first  reaction,  as  has  been  experimentally  observed 
(Umbarger,  1956;  Yates  and  Pardee,  1956;  Monod  et  al„  1963)  and  theoreti¬ 
cally  rationalized  (Alves  and  Savageau,  2000d).  The  dynamic  behavior  of  such 
systems  can  be  described  by  a  set  of  ordinary  differential  equations. 

Assume  that  the  net  flux  through  the  pathway  is  positive  (i.e.,  material 
is  coming  into  the  system  from  Xq,  which  is  held  constant,  and  exiting  the 
system  through  Xn).  The  net  positive  flux  through  the  reaction  immediately 
before  the  intermediate  Xj  (considered  the  net  influx  to  the  pool  of  Xj)  can 
be  accounted  for  by  a  single  aggregate  rate  law,  representing  either  the 
difference  between  the  rate  laws  for  the  constituent  forward  and  reverse 
reactions  when  the  overall  reaction  is  reversible  or  the  rate  law  for  the 
forward  reaction  alone  when  the  overall  reaction  is  irreversible.  Similarly, 
the  net  positive  flux  through  the  reaction  immediately  after  the  intermedi¬ 
ate  Xj  (considered  the  net  efflux  from  the  pool  of  Xj)  can  be  represented  by 
a  single  aggregate  rate  law. 

The  dynamical  behavior  of  the  models  in  Fig.  1  can  be  accurately 
described  in  a  region  about  their  nominal  steady  state  by  using  a  local 
S-system  representation  within  the  power-law  formalism  (Savageau,  1 969, 
1971a,  1976,  1996).  For  details  about  different  ways  to  aggregate  rate  laws 
and  approximate  them  as  S-systems,  see  Sorribas  and  Savageau  (1989). 
The  resulting  equations  are  the  following: 


The  aggregate  rate  law  Vj  for  the  influx  of  X{  is  characterized  by  a 
multiplicative  parameter  (rate  constant),  a*,  which  influences  the  time  scale 
of  the  reaction  and  is  always  positive,  and  a  set  of  exponential  parameters 
(kinetic  orders),  which  represents  the  influence  of  metabolite  on 
aggregate  rate  law  Vj.  If  X}  influences  the  aggregate  rate  law  Vj,  either  as 
a  reactant  or  a  modulator,  and  if  an  increase  in  the  concentration  of  Xj 
causes  an  increase  in  the  rate  Vit  then  the  kinetic  order  will  be  positive.  If 
an  increase  in  the  concentration  of  Xj  causes  a  decrease  in  the  rate  Vit  then 
the  kinetic  order  will  be  negative.  If  an  increase  in  the  concentration  of  X} 
causes  neither  an  increase  nor  a  decrease  in  the  rate  Vj,  then  the  kinetic 
order  will  be  zero.  Thus,  the  positive  kinetic  orders  in  Eq.  1  are  &  5_ x  (I  < 
<  /i  -f  1),  since  these  are  the  kinetic  orders  for  substrates  of  reactions.  All 
other  exponents  are  negative  or  zero,  depending  on  whether  Xt  is  the 
product  of  a  reversible  (gH  <  0)  or  an  irreversible  (g5j  =  0)  reaction.  The 
fact  that  gjj  is  negative  if  the  reaction  is  reversible  is  evident  from  ther¬ 
modynamic  considerations.  If  the  concentration  of  the  product  is  increased, 
the  thermodynamic  potential  across  the  reversible  reaction  is  reduced  and 
the  net  flux  must  decrease.  Hence,  the  kinetic  order  gjj  must  be  negative  to 
represent  this  decrease. 


FIGURE  1  Schematic  representation  of  an  un¬ 
branched  biosynthetic  pathway  subject  to  control  by 
end-product  inhibition.  The  concentration  of  the  initial 
substrate  X0  is  an  independent  variable  with  fixed 
value;  the  demand  for  the  end  product  Xn  is  repre¬ 
sented  by  Xn+J,  which  also  is  an  independent  variable. 
The  reference  System  0  has  n  fully  reversible  reac¬ 
tions.  The  alternative  systems  have  one  irreversible 
reaction  and  the  other  reactions  are  identical  to  the 
corresponding  reactions  in  the  reference  system;  Sys¬ 
tem  1  has  an  irreversible  reaction  at  the  first  position; 
System  i  has  an  irreversible  reaction  at  the  it h  position; 
System  n  has  an  irreversible  reaction  at  the  nth 
position. 


*n+1 
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Steady-state  solution  and  key 
systemic  properties 

The  S-systems  describing  the  dynamic  behavior  of  the  models  in  Fig.  1  can 
be  solved  analytically  for  the  steady  state  (Savageau,  1969,  1971a),  where 
the  rates  of  production  and  consumption  for  each  metabolite  are  the  same. 
By  equating  these  rates  and  taking  logarithms  of  both  sides  of  the  resulting 
equations,  one  can  write  the  following  matrix  equation: 


b 

~  8 io^o 

b2 

au  • 

^ln 

_bn  4- 

Ci 

£n+  l,n+I  ^n+L 

.  Anl  *  ' 

^nn  _ 

A 

where  Y,  =  log(^),  bt  =  log<ori+ ,/ocj),  and  aV}  =  gx j  -  gi+1>j  for  1  < 
0\j)  ^  ii. 

Two  types  of  coefficients,  logarithmic  gains  and  parameter  sensitivities, 
can  be  used  to  characterize  the  steady  state  of  such  models.  Logarithmic 
gains  measure  the  relative  influence  of  each  independent  variable  on  each 
dependent  variable  of  the  model  (Savageau,  1971a;  Shiraishi  and  Sav¬ 
ageau,  1992).  For  example, 


d  log(Xj)  dYx 
UX;,Xo )- dlog(X0)~  dYo 


(3) 


measures  the  percent  change  in  the  concentration  of  intermediate  X,  caused 
by  a  percentage  change  in  the  concentration  of  the  initial  substrate  X0. 
Logarithmic  gains  provide  important  information  concerning  the  amplifi¬ 
cation  or  attenuation  of  signals  as  they  are  propagated  through  the  system. 
Parameter  sensitivities  measure  the  relative  influence  of  each  parameter  on 
each  dependent  variable  of  the  model  (Savageau,  1971b;  Shiraishi  and 
Savageau,  1992).  For  example. 


S(X„  Pj)  = 


d  log(X,) 
d  \og(p) 


(4) 


measures  the  percentage  change  in  the  concentration  of  intermediate  Xj 
caused  by  a  percentage  change  in  the  value  of  the  parameter/^.  Parameter 
sensitivities  provide  important  information  about  system  robustness,  i.e., 
how  sensitive  the  system  is  to  perturbations  in  the  structural  determinants 
of  the  system.  Because  steady-state  solutions  exist  in  closed  form,  we  can 
calculate  each  of  the  two  types  of  coefficients  simply  by  taking  the 
appropriate  derivatives.  Although  the  mathematical  operations  involved 
are  the  same  in  each  case,  it  is  important  to  keep  in  mind  that  the  biological 
significance  of  the  two  types  of  coefficients  is  very  different. 

The  local  stability  of  the  steady  state  can  be  determined  by  applying  the 
Routh  criteria  (Dorf,  1992).  The  magnitude  of  the  two  critical  Routh 
conditions  can  be  used  to  quantify  the  margin  of  stability  (Savageau,  1976). 

Systems  should  respond  quickly  to  changes  in  their  environment  (Sav¬ 
ageau,  1975).  Thus,  another  key  property  of  the  systems  is  their  temporal 
response,  which  was  determined  as  follows.  At  time  zero,  each  interme¬ 
diate  concentration  was  set  to  a  value  20%  less  than  its  steady-state  value. 
The  dynamics  were  then  followed  from  this  initial  condition,  and  the  time 
for  all  the  concentrations  to  settle  to  within  1%  of  their  final  steady-state 
value  was  calculated. 


Mathematically  controlled  comparison 

The  method  of  Mathematically  Controlled  Comparison  was  specifically 
developed  to  make  rigorous  comparisons  of  alternative  regulatory  designs 
(Savageau,  1972,  1996;  Irvine  and  Savageau,  1985;  Alves  and  Savageau, 
2000c,  d).  This  method  compares  alternative  designs  for  a  system  that 
performs  a  given  function  and,  by  using  mathematical  equivalence  con¬ 


straints  to  reduce  their  extraneous  differences,  determines  the  irreducible 
differences  between  their  systemic  behaviors.  This  method  requires  closed- 
form  solutions  for  the  steady  state,  which,  as  noted  above,  can  be  obtained 
with  the  local  S-system  representation.  Important  functional  constraints  are 
introduced  by  equating  relevant  steady-state  properties  of  the  alternative 
systems  being  compared.  Further  analysis  (dynamic  as  well  as  steady-state) 
is  performed  and  a  profile  of  ratios  is  constructed  for  corresponding  results 
from  the  alternative  systems.  In  some  cases,  a  ratio  can  be  determined 
analytically  to  be  less  than,  equal  to,  or  greater  than  unity.  For  example,  if 
the  ratio  of  values  for  some  property  P  in  a  reference  system  to  the  same 
property  in  an  alternative  system  is  larger  than  unity,  then  the  reference 
system  can  always  be  made  to  have  a  larger  value  for  P,  no  matter  how 
large  the  value  for  P  in  the  alternative  system. 

However,  if  one  wishes  to  know  how  much  greater  than  unity  a  given 
ratio  is,  then  one  needs  to  know  actual  parameter  values.  These  parameter 
values  are  not  always  available;  if  they  are  available,  they  are  not  always 
accurate.  Moreover,  there  are  cases  in  which  the  ratio  can  be  less  than  or 
greater  than  unity,  depending  on  the  specific  values  for  the  parameters,  so 
Mathematically  Controlled  Comparisons  that  use  actual  parameter  values 
may  lack  analytical  generality. 

In  this  work  we  use  our  method  (Alves  and  Savageau,  2000c),  which  is 
a  generalization  of  the  original  analytical  method  for  making  mathemati¬ 
cally  controlled  comparisons;  it  includes  numerical  comparisons  in  which 
statistical  techniques  (Alves  and  Savageau,  2000a)  yield  results  that  are 
general  in  a  statistical  sense.  We  compare  the  systemic  performance  of  a 
fully  reversible  pathway  (Fig.  1,  System  0)  with  that  of  pathways  in  which 
only  one  of  the  reactions  is  irreversible  (Fig.  1,  System  1 — System  ti).  We 
consider  all  possible  positions  for  the  irreversible  reaction  in  pathways  with 
2  to  7  reactions.  The  system  in  which  each  reaction  of  the  pathway  is 
reversible  will  be  referred  to  as  the  reference  system  or  System  0,  and  the 
otherwise  equivalent  system  in  which  the  ith  reaction  of  the  pathway  is 
irreversible  will  be  referred  to  as  an  alternative  system  or  System  /.  This 
method  also  allows  direct  comparison  of  System  i  and  System  j,  each  of 
which  has  an  irreversible  reaction  but  in  different  positions. 


Internal  and  external  equivalence 

We  are  concerned  with  the  irreducible  differences  in  systemic  behavior 
between  two  pathways  of  reversible  reactions  that  differ  only  by  the 
existence  of  one  irreversible  reaction  in  a  pathway  where  the  other  has  a 
reversible  reaction.  By  irreducible  differences  we  mean  differences  that 
persist  no  matter  what  the  values  are  for  the  parameters  that  define  the 
systems.  It  is  therefore  important  to  ensure  that  all  other  changes  in 
systemic  behavior  are  eliminated  to  the  extent  possible.  To  achieve  this 
aim,  we  shall  require  that  the  reference  and  alternative  systems  be  equiv¬ 
alent  from  both  an  internal  and  external  perspective  (Savageau,  1972, 
1976;  Irvine  and  Savageau,  1985). 

By  internal  equivalence  we  mean  that  the  values  of  the  corresponding 
parameters  for  all  the  unchanged  reactions  are  the  same  in  both  the 
reference  and  alternative  systems.  By  external  equivalence  we  mean  that 
systemic  behaviors  of  the  reference  and  alternative  systems  are  made 
identical,  which  leads  to  constraints  upon  the  values  for  the  parameters  of 
the  changed  reaction.  For  example,  consider  the  reference  system  (Fig.  1, 
System  0)  and  an  alternative  system  in  which  the  first  reaction  is  irrevers¬ 
ible  (Fig.  1,  System  1).  The  parameters  that  characterize  the  first  reaction 
of  the  pathway  will  differ  in  general  between  these  two  systems.  The 
parameters  a„  g}0,  g„,  and  gXn  of  System  0  become  the  parameters  a',.  g',0, 
£'n  =  0,  and  g\n  of  System  1.  Since  we  wish  to  determine  the  necessary 
systemic  effects  that  are  due  to  the  change  from  reversibility  to  irrevers¬ 
ibility,  we  shall  specify  values  for  the  parameters  a[ ,  g',0,  and  gjn  that 
eliminate  as  many  extraneous  systemic  effects  as  possible.  This  is  accom¬ 
plished  by  deriving  mathematical  expressions  for  a  given  steady-state 
property  in  each  of  the  two  models,  equating  these  expressions  to  produce 
a  constraint  equation,  and  then  solving  the  constraint  equation  for  one  of 
the  primed  parameters  in  terms  of  the  unprimed  parameters.  When  all 
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primed  parameters  have  been  specified  in  this  fashion,  there  will  be  no 
more  degrees  of  freedom  with  which  to  make  systemic  properties  equiv¬ 
alent  between  the  two  models,  and  the  two  systems  will  be  maximally 
equivalent  from  an  external  perspective. 

Calculating  the  constraints  for  external  equivalence 

We  require  the  reference  and  alternative  systems  in  Fig.  1  to  have  the  same 
steady-state  logarithmic  gains  with  respect  to  the  initial  substrate  of  the 
pathway  and  the  same  concentrations  (and  thus  flux).  These  two  types  of 
constraints  are  sufficient  to  fix  the  two  primed  parameters  of  the  irrevers¬ 
ible  reaction  when  its  position  is  beyond  the  first  step. 

When  the  position  of  the  irreversible  reaction  is  at  the  first  step,  there 
are  three  primed  parameters  that  need  to  be  fixed  (see  previous  section). 
For  the  third  constraint  we  require  the  reference  and  alternative  systems  in 
Fig.  1  to  have  the  same  sensitivity  of  the  concentrations  with  respect  to 
changes  in  the  parameter  a}.  This  constraint  is  preferred  over  other  pos¬ 
sibilities  because  the  reference  system  and  alternative  system  will  then 
exhibit  the  smallest  number  of  systemic  differences,  which  is  the  objective 
in  a  mathematically  controlled  comparison.  One  could  choose  a  different 
systemic  property  to  form  the  third  constraint.  However,  the  reference 
system  and  alternative  system  would  then  exhibit  a  larger  number  of 
systemic  differences,  some  of  which  could  be  eliminated  by  the  choice  of 
the  preferred  constraint. 

Thus,  the  following  system  of  algebraic  equations  is  solved  to  obtain  the 
analytic  constraints  for  the  primed  parameters  of  the  irreversible  reaction  at 


the  ilh  step: 

^o) Reference  -^o)  Alternative 

l<i<n 

(5a) 

S(X[,  CX]  Reference  ^1 ) Alternative 

/  =  1 

(5b) 

lOgf-X^Referencc  ^§[^ijAltcmativc 

1  <  i  <  n 

(5c) 

By  constraining  one  of  the  logarithmic  gains 

(Eq.  5a),  all  of  them  are 

constrained.  This  allows  us  to  fix  the  kinetic  order  When  the 

irreversible  reaction  occurs  at  the  first  step,  the  additional  constraint  (Eq. 
5b)  allows  us  to  fix  the  kinetic  order  g|n.  By  constraining  one  of  the 
concentrations  (Eq.  5c),  all  of  them,  as  well  as  the  steady-state  flux,  are 
constrained.  This  allows  us  to  fix  the  rate  constant  a-. 

The  parametric  constraints  obtained  by  solving  Eq.  5  have  the  following 
form: 

£i'.i-i  =  Su-i/fe- «) 

gL  =  g\«  +/„(?.  n)  (6) 

log(«i)  =/„(«,  g,  n) 

where  the  parameters  a  and  g  in  the  functions  /  are  intended  to  represent 
a  set  of  rate  constants  and  kinetic  orders  that  depend  both  on  the  length  of 
the  pathway  and  on  the  systems  being  considered.  The  specific  forms  of 
these  constraints  are  presented  in  the  Appendix  for  n  =  2  to  n  ~  7. 

Numerical  analysis 

The  analytical  results  give  qualitative  information  that  characterizes  the 
effect  of  irreversibility  in  the  systems  of  Fig.  1.  To  obtain  quantitative 
information,  one  must  introduce  specific  values  for  the  parameters  and 
compare  systems.  For  this  purpose  we  have  randomly  generated  a  large 
ensemble  of  parameter  sets  and  selected  5000  of  these  sets  that  define 
systems  consistent  with  various  physical  and  biochemical  constraints. 
These  constraints  include  mass  balance,  low  concentrations  of  intermedi¬ 
ates  and  small  changes  in  their  values  to  minimize  utilization  of  the  solvent 
capacity  in  the  cell,  small  values  for  parameter  sensitivities  so  as  to 


desensitize  the  system  to  spurious  fluctuations  affecting  its  structure,  and 
stability  margins  large  enough  to  ensure  local  stability  of  the  systems.  A 
detailed  description  of  these  methods  can  be  found  in  Alves  and  Savageau 
(2000b).  Mathematica  (Wolfram,  1997)  was  used  for  all  the  numerical 
procedures. 


Density  of  ratios  plot 

To  interpret  the  ratios  that  result  from  our  analysis,  we  use  Density  of 
Ratios  plots  as  defined  in  Alves  and  Savageau  (2000a).  The  primaiy 
density  plots  from  the  raw  data  have  the  magnitude  for  some  property  of 
the  reference  system  on  the  x-axis  and  the  corresponding  ratio  of  magni¬ 
tudes  (reference  system  to  alternative  system)  on  the  y-axis.  The  primary 
plot  can  be  viewed  as  a  list  of  5000  paired  values  that  can  be  ordered  with 
respect  to  the  reference  magnitude  to  form  a  list  Lx  in  which  the  first  pair 
has  the  lowest  measured  value  for  property  P  in  the  reference  model,  the 
second  has  the  second  lowest,  and  so  on.  Secondary  density  plots  are 
constructed  from  the  primary  plots  by  the  use  of  moving  quantile  tech¬ 
niques  with  a  window  size  of  500.  The  procedure  is  as  follows.  One 
collects  the  first  500  ratios  from  the  list  L,,  calculates  the  quantile  of 
interest  for  this  sample,  and  pairs  this  number  (R)  with  the  median  value  of 
the  corresponding  P  values  of  the  reference  model,  denoted  (P).  One 
advances  the  window  by  one  position,  collects  ratios  2  through  501, 
calculates  (R),  and  pairs  it  with  the  corresponding  (P)  value  and  continues 
in  this  manner  until  the  last  ratio  from  the  list  Lx  is  used  for  the  first  time. 
This  procedure  generates  a  second  list  L 2  and  the  corresponding  secondary 
plot.  The  slope  in  the  secondary  plot  measures  the  degree  of  correlation 
between  the  quantities  plotted  on  the  x-  and  v-axes. 

Mathematically  controlled  comparison 

Several  criteria  are  considered  to  determine  the  functional  effectiveness  of 
unbranched  biosynthetic  pathways  (Savageau,  1976;  Alves  and  Savageau, 
2000d).  The  systems  being  compared  will  be  equal  on  the  bases  of  the  first 
two  criteria  because  of  external  equivalence  constraints,  whereas  they  will 
differ  with  respect  to  the  remaining  five  criteria. 

1 .  The  concentration  of  intermediates  should  be  low,  because  otherwise  it 
would  tax  the  limited  solvent  capacity  of  the  cell  and  potentially 
interfere  in  a  nonspecific  way  with  unrelated  reactions  (e.g.,  Atkinson, 
1969;  Savageau,  1972;  Srere,  1987;  see  Levine  and  Ginsburg,  1985,  for 
a  general  discussion  of  the  subject  from  different  perspectives).  Due  to 
the  conditions  for  external  equivalence  that  we  shall  impose,  the  con¬ 
centrations  of  the  corresponding  intermediates  will  be  the  same  for  all 
comparable  systems  being  examined. 

2.  The  changes  in  concentration  of  intermediates  caused  by  changes  in  the 
initial  substrate  should  be  small.  This  also  will  ensure  that  the  solvent 
capacity  is  not  exceeded  when  the  concentration  of  intermediates 
changes.  Again,  due  to  the  conditions  for  external  equivalence,  the 
corresponding  logarithmic  gains  will  be  the  same  for  all  the  systems 
being  examined.  These  changes  are  quantified  by  means  of  the  loga¬ 
rithmic-gain  factors  L{XV  *„)  as  defined  in  Eq.  3. 

3.  The  systems  should  be  robust,  i.e.,  the  concentrations  and  flux  should  be 
insensitive  to  changes  in  the  parameters  that  define  the  structure  of  the 
system  (Savageau,  1971b;  Shiraishi  and  Savageau,  1992).  If  these 
sensitivities  are  high,  then  small  fluctuations  in  parameter  values  (e.g., 
due  to  physical  changes  such  as  temperature  or  to  errors  in  replication, 
transcription,  or  translation)  would  lead  to  large  deviations  from  the 
normal  behavior  of  the  system.  These  changes  are  quantified  by  means 
of  the  parameter  sensitivities  S(X„  p})  and  S(V,  p})  as  defined  in  Eq.  4. 
Aggregate  sensitivities  for  intermediate  concentrations  and  flux  are 
defined  as  follows:  SiXJ  -  V2j5(A,i  p})2  and  5(V0  =  V2jS(V\  pj)2. 

4.  The  systems  should  have  a  steady  state  that  is  dynamically  stable 
following  small  perturbations  in  the  concentration  variables,  otherwise 
they  would  be  dysfunctional,  i.e.,  unable  to  maintain  homeostasis  in  the 
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face  of  spurious  perturbations.  Furthermore,  the  margins  of  stability 
should  be  sufficiently  large  that  changes  in  parameter  values  will  not 
produce  an  unstable  steady  state.  There  are  n  Routh  conditions  that 
determine  whether  the  steady  state  of  a  system  with  n  variables  will  be 
stable.  The  margins  of  stability  are  quantified  by  the  size  of  the  critical 
Routh  conditions,  which  are  the  last  two  (Savageau,  1976;  Hlavacek  and 
Savageau,  1997). 

5.  The  flux  through  the  pathway  should  be  highly  responsive  to  changes  in 
the  demand  for  end  product.  This  ensures  that  the  amount  of  material 
flowing  through  the  pathway  is  tightly  coupled  to  the  needs  of  cellular 
metabolism.  This  criterion  is  quantified  by  the  logarithmic  gain  L(V, 
Xn+J),  as  defined  in  Eq.  3. 

6.  The  changes  in  concentration  of  the  intermediates  caused  by  changes  in 
demand  for  the  end  product  should  be  small.  This  ensures  that  the 
depletion  of  end  product  is  minimized  when  there  is  an  increase  in 
demand.  It  also  ensures  that  the  solvent  capacity  is  not  exceeded  by  the 
intermediates  when  demand  for  the  end  product  changes.  These  changes 
are  quantified  by  means  of  the  logarithmic-gain  factors  L{Xn,  Xn+ ,)  and 
L(X jf  Xn+1)  as  defined  in  Eq.  3. 

7.  The  systems  should  respond  quickly  to  changes  in  their  environment, 
i.e.,  they  should  have  short  transient  times  (Savageau,  1975).  Organisms 
harboring  systems  with  a  sluggish  response  to  change  will  be  at  a 
disadvantage  when  competing  with  other  organisms  in  a  rapidly  chang¬ 
ing  environment.  Transient  time  will  be  measured  as  the  time  it  takes 
the  system  to  return  to  its  steady  state  after  a  small  perturbation  in 
concentrations. 


RESULTS 

In  all  the  results  described  below,  the  reference  and  alter¬ 
native  systems  have  the  same  steady-state  values  for  the 
flux  through  the  pathway,  the  same  concentrations  of  the 
corresponding  metabolites,  and  the  same  logarithmic  gains 
for  pathway  flux  and  for  metabolite  concentrations  in  re¬ 
sponse  to  changes  in  the  initial  substrate.  These  equivalent 
behaviors  are  a  direct  consequence  of  the  constraints  for 
internal  and  external  equivalence,  as  described  above  in 
Methods.  The  reference  and  alternative  systems  differ  on 
the  basis  of  their  robustness,  margin  of  stability,  response  to 
demand  for  end  product,  and  transient  time. 

Robustness 

We  compare  the  robustness  of  the  reference  system  having 
all  reversible  reactions  with  that  of  an  otherwise  equivalent 
alternative  system  having  one  irreversible  reaction  in  all 
possible  positions.  In  most  cases,  symbolic  analysis  is  suf¬ 
ficient  to  determine  whether  the  ratio  of  a  given  parameter 
sensitivity  in  the  reference  system  to  the  corresponding 
sensitivity  in  the  alternative  system  is  larger  or  smaller  than 
1;  in  the  remaining  cases,  symbolic  analysis  is  incapable  of 
determining  the  value  for  the  ratio  because  it  depends  on  the 
specific  values  of  the  parameters.  Results  of  the  symbolic 
analysis  are  summarized  in  Table  1  for  pathways  of  length 
2  to  7.  The  following  patterns  can  be  observed  in  the  data. 

The  reference  system  is  always  more  robust  than  the 
alternative  system  with  an  irreversible  synthesis  of  the  end 
product,  because  the  ratios  of  parameter  sensitivities  are  all 
less  than  or  equal  to  1.  As  the  position  of  the  irreversible 


reaction  approaches  the  beginning  of  the  pathway,  the  num¬ 
ber  of  sensitivities  that  are  equal  in  the  systems  being 
compared  decreases.  The  concentration  of  the  product  of  the 
irreversible  reaction  is  always  more  sensitive  to  parameter 
changes  than  the  product  of  the  corresponding  reversible 
reaction  in  the  reference  system. 

In  general,  numerical  methods  are  needed  to  decide 
which  systems  are  more  robust  because  this  cannot  be  done 
by  examining  just  the  symbolic  sensitivities.  The  numerical 
results  in  Fig.  2  A  show  that  the  aggregate  sensitivity  of  X, 
to  parameters  is  on  average  the  same  in  the  reference  and 
alternative  systems  if  X-}  is  surrounded  by  reversible  reac¬ 
tions.  If  either  the  reaction  that  produces  or  the  reaction  that 
consumes  Xx  is  irreversible,  then  that  concentration  is  on 
average  more  robust  in  the  reference  system.  Fig.  2  B  shows 
that,  on  average,  the  reference  System  0  has  smaller  aggre¬ 
gate  sensitivities  for  flux  than  alternative  Systems  L  How¬ 
ever,  these  differences  are  only  significant  for  alternative 
Systems  1  and  n. 


Margin  of  stability 

Comparing  System  0  with  System  i  shows  that  the  stability 
margins  for  systems  with  2  reactions  are  always  larger  in  a 
reference  System  0.  For  systems  with  3  to  7  reactions,  these 
margins  can  be  larger  in  either  system.  Direct  comparison  of 
System  i  with  System  j  shows  that  the  stability  margins  can 
be  larger  in  either  system,  depending  on  the  parameter 
values. 

Numerical  results  show  that,  on  average,  the  reference 
System  0  has  larger  margins  of  stability  than  alternative 
Systems  t  ( i  >  1).  Numerical  results  also  show  that,  on 
average,  the  reference  System  0  has  larger  margins  of  sta¬ 
bility  than  System  1,  although  the  differences  are  insignif¬ 
icant  (Fig.  2  C). 


Response  to  demand  for  end  product 

Symbolic  comparisons  with  the  reference  system  show  that 
the  flux  through  System  1  is  more  responsive  to  changes  in 
the  demand  for  end  product  than  is  the  flux  through  System 
0.  However,  for  i  >  1,  the  flux  through  System  0  is  more 
responsive  to  changes  in  the  demand  for  end  product  than  is 
the  flux  through  System  i.  This  demonstrates  that,  with 
respect  to  this  systemic  property,  System  1  is  better  than 
System  0  and  better  than  any  of  the  other  alternatives.  Direct 
comparison  of  Systems  i  and  j  with  respect  to  this  systemic 
property  reveals  additional  information.  If  i,j  >  1,  then  the 
flux  through  Systems  i  and  j  is  equally  responsive  to 
changes  in  the  demand  for  end  product. 

Numerical  results  (Fig.  2  D)  show  that  average  differ¬ 
ences  between  the  reference  System  0  and  alternative  Sys¬ 
tem  1  are  about  120%,  whereas  the  differences  between  the 
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TABLE  1  Comparison  of  parameter  sensitivities  for  the  reference  and  alternative  systems  as  a  function  of  pathway  lenqth  and 
of  position  for  the  irreversible  step  in  the  pathway 


n  —  2. 


n  =  3 


n  —  4 


>1  <1  =  1 


>1  <1  =1 


n  -  6 


>1  <1  =1 


>1  <1  =1 


>1  <1 


>1  <1  =1 


1st  reaction  irreversible  (i  =  1) 


V 

X 2 
X, 

X 4 
*5 
*6 
X-, 


3  3  2  0  6 
0  4  2  2  0 
5  12  0  3 


6 

0 

3 

11 

8 


12 

1 

4 

8 

11 

13 


0 

12 

11 

7 

2 

2 


1  17  2 

16  0  2 

12  4  2 

9  7  2 

7  9  2 

3  13  2 


1 


17 


2nd  reaction  irreversible  ( i  -  2) 

V  2  1  5  0  5  1  5  0 

0  3  5  0  3  3  5  0 

03  500452 

—  —  —  —  3  3  5  0 


xi 

X2 

X 3 
X4 
X 5 
*6 
X~i 


11 

9 

0 

3 

6 


1 

3 

10 

7 

4 
3 


3 

3 

13 

10 

7 

3 

3 


12 

12 

0 

3 

6 

10 

12 


3rd  reaction  irreversible  (/'  =  3) 

V  —  —  —  —  2  1  80 

X,  —  —  —  _  0  3  80 

X2  —  —  —  —  0  3  8  0 

*3  —  —  —  0  3  8  0 

X4  —  —  —  —  —  —  —  — 

^5  -  -  -  -  -  -  -  - 

X*  —  —  —  —  —  —  —  — 

4th  reaction  irreversible  (/  =  4) 

V 

*3-  —  -  —  —-  -  — 

*4  “  —  -  --  --  - 

X,  —  —  —  —  —  —  —  — 


3 

3 

3 

10 

7 


2 

0 

0 

0 

0 


11 

11 

II 

11 

11 


11 

11 

11 

11 

11 

11 

11 


5th  reaction  irreversible  (i  —  5) 

X2  —  —  —  — 

X3  —  —  —  — 

X4  —  —  —  — 

*6  -  -  -  - 


14 

14 

14 

14 

14 

14 


14 

14 

14 

14 

14 

14 

14 


6th  reaction  irreversible  (/  =  6) 

*4-  —  -- 

*«---- 
Xy  —  —  —  — 


17 

17 

17 

17 

17 

17 

17 


2 

3 

3 

13 

10 

7 

3 

2 

2 

7 

6 

9 

10 

7 

3 

2 

2 

6 

5 

5 

5 

5 

0 

2 

5 

2 

2 

3 
2 

4 

4 

5 


21 


19 


2  15  4  2 

2  12  7  2 

2  10  9  2 

2  6  13  2 


3 
3 

16 
13 
10 
6 
3 

3  15 


15 

21 

15 

15 

0 

3 

6 

10 

13 


13 

12 

12 

0 

3 

6 

10 

13 

10 

5 

6 
3 
0 
3 
7 

10 

7 

1 

2 

2 

2 

2 

7 

10 

1 

2 

2 

1 

2 

0 

0 

1 


1 

1 

1 

1 

1 

1 

1 

1 

14 

14 

14 

14 

14 

14 

14 

14 

17 

17 

17 

17 

17 

17 

17 

17 


0 

0 

0 

2 

2 

2 

2 

0 

0 

0 

0 

0 

2 

2 

2 

0 

0 

2 

2 

2 

2 

2 

2 

0 

0 

2 

2 

2 

2 

2 

2 
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TABLE  1  Continued  _ _ _ _ _ _ _ _ _ _ 

7T2  n  -  3  »  =  4  n  =  5  »  =  6  "  ~  7 _ 

>i  <1  =1  ?  >1  <i  =1  i  ><  <»  =1  ?  >[  <l  zl  l  >]  <l  =1  l  >l  <1 _ — _ - 

7th  reaction  irreversible  (/  =  7)  _  _  _  _  _  _  _  2  1  20  0 

y—  —  ~~ZZZZZZ_  —  —  —  —  —  —  0  3  20  0 

~  _  _  _  _  _  _  _  _  _  _  _  _  —  —  0  3  20  0 

X2  —  —  —  —  —  _  _  _  _  _  _  _  _  —  _  _  o  3  20  0 

*3  —  —  —  —  —  —  _  _  _  _  _  _  _  _  —  —  —  0  3  20  0 

X4  _  _  _  _  _  _  _  _  _  _  _  _  —  —  0  3  20  0 

X5  —  “  _  _  _  _  _  _  _  _  _  _  _  —  —  —  0  3  20  0 

X$  ____________  0  3  20  0 

X  _  _  —  —  —  —  —  _ _ _ _ _ - _ — - — — - 

The  sensitivities  of  the  steady-state  flux  (V)  through  the  pathway  and  of  the  steady-state  concentrations  (X,)  are  calculated  with  respect  to  each  of  the 
The  semitivines  o  V  alternative  system.  The  ratio  of  a  given  sensitivity  in  the  reference  system  relative  to  the  corresponding 

Srin^l^r^r^ennined  to  be  greaL  than  one.  less  than  one,  equal  to  one,  or  indetenninate.  The  number  of  reactions  ,n  the 
sensitivity  the  aite  y  irreversible  reaction  in  the  pathway,  i,  varies  from  I  to  n.  The  ratios  are  the  values  of  the  parameter 

S  foMeference^  SystemfT  rehtive^o  those'for  alternative  Systems  i  (see  Fig.  1).  Column  legend:  >1.  number  of  sensitivities  that  are  larger  ,n 
sensitivities  for  refe  Sy  h  smaller  reference  System  0;  =  1 ,  number  of  sensitivities  that  are  the  same  in  both  systems  under 

"r^^^rrrbe  larger  in  either  system,  depending  on  parameter  values.  For  example,  the  number  5  at  the  3rd  row,  1st 
column* fmsition  of  the  i  =  1,  «  =  2  section  of  the  table  means  that  there  are  five  different  parameters  in  a  two-step  pathway  for  which  the  sensitivities 
of  X2  are  larger  in  System  0  than  in  System  1 . 


reference  System  0  and  alternative  Systems  i  (i  >  1)  are,  on 
average,  less  than  2%. 

The  end-product  concentration  in  System  1  is  less  respon¬ 
sive  to  changes  in  the  demand  for  end  product  than  is  the 
end  product  in  System  0.  However,  for  /  >  1,  the  end 
product  concentration  in  System  0  is  less  responsive  to 
changes  in  the  demand  for  end  product  than  is  the  end 
product  in  System  i.  Again,  System  1  is  better  than  System 
0  and  better  than  any  of  the  other  alternatives. 

Numerical  results  (Fig.  2  E)  show  that  average  differ¬ 
ences  between  the  reference  System  0  and  alternative  Sys¬ 
tem  1  can  be  between  50  and  100%,  whereas  the  differences 
between  the  reference  System  0  and  alternative  Systems  / 
(/  >  1)  are,  on  average,  much  smaller  (2-8%). 


Transient  time 

There  is  no  explicit  solution  for  the  dynamic  equations 
given  in  Eq.  1  that  would  allow  one  to  determine  symbol¬ 
ically  the  transient  responses  of  the  various  systems  in  Fig.  1. 
The  numerical  results  in  Fig.  2  F  show  that  the  transient  time 
for  alternative  Systems  i  (/  <  n)  is,  on  average,  smaller  than 
that  for  the  reference  System  0,  whereas  the  transient  time  for 
alternative  System  n  is  larger  than  that  for  the  reference  System 
0.  A  direct  comparison  of  System  i  and  System  j  (/,  j  n ) 
shows  that  the  transient  time  can  be  larger  in  either  system, 
depending  on  the  length  of  the  pathway  (data  not  shown). 

Correlations  between  ratios  and 
systemic  properties 

The  aggregate  sensitivities  of  the  concentrations  in  System 
i  on  average  approach  those  in  System  0  as  the  concentra¬ 
tions  of  intermediates  decrease,  i.e.,  the  ratio  of  aggregate 


sensitivities  approaches  1  (Fig.  2  A).  The  ratio  for  aggregate 
sensitivities  of  flux  in  System  0  and  System  1  also  ap¬ 
proaches  1 ,  whereas  the  same  ratio  in  System  0  and  Systems 

1  (i  >  1)  decreases  away  from  1  (Fig.  2  B).  Thus,  the 
differences  in  robustness  (criterion  3)  in  System  0  and 
System  1  become  less  significant,  whereas  the  differences  in 
System  0  and  Systems  i  (/  >  1)  become  more  significant  at 
low  concentrations  of  intermediates,  which  is  our  first  cri¬ 
terion  for  functional  effectiveness. 

The  ratios  involving  the  critical  margins  of  stability  can 
be  positively  or  negatively  correlated  with  the  concentra¬ 
tions  of  intermediates,  depending  on  the  particular  compar¬ 
ison  (Fig.  2  C).  There  is  no  general  pattern  apparent  in  this 
panel,  so  these  correlations  provide  no  further  information 
regarding  criterion  4. 

The  ratios  for  System  0  relative  to  System  1  of  logarithmic 
gains  in  flux  with  respect  to  changes  in  the  demand  for  end 
product  are  positively  correlated  with  low  concentrations  of 
intermediates,  although  the  slope  for  this  correlation  is  small. 
The  same  ratios,  but  for  System  0  relative  to  System  i  (i  >  1), 
are  negatively  correlated  with  low  concentrations  of  interme¬ 
diates,  although  the  slope  for  this  correlation  is  also  small  (Fig. 

2  D).  Thus,  the  differences  in  responsiveness  of  flux  to  changes 
in  demand  for  end  product  (criterion  5)  in  System  0  and 
System  1,  and  in  System  0  and  System  i  (/  >  1),  become  more 
significant  at  low  concentrations  of  intermediates. 

The  ratios  involving  logarithmic  gains  in  end  product 
concentration  with  respect  to  changes  in  the  demand  for  end 
product  are  positively  correlated  with  the  concentrations  of 
intermediates  (Fig.  2  E).  Thus,  the  differences  in  depletion 
of  end  product  following  an  increase  in  demand  for  end 
product  (criterion  6)  in  System  0  and  System  1  become  less 
significant  at  low  concentrations  of  intermediates. 
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FIGURE  2  Typical  correlation  curves  between  ratios  of  magnitudes  in  reference  System  0  relative  to  those  in  alternative  Systems  i  versus  concentrations 
of  intermediates.  The  data,  which  are  generated  by  changing  all  of  the  parameter  values  randomly  within  the  constraints  described  in  the  Methods  section, 
are  displayed  in  a  density  of  ratios  plot  (Alves  and  Savageau,  2000a).  The  v-axis  indicates  which  of  two  systems  on  average  has  the  larger  magnitude;  the 
j:-axis  indicates  how  this  difference  changes  as  a  function  of  the  concentration  of  intermediates  (see  criterion  1  in  the  text).  The  subscripts  j  and  k  refer  to 
arbitrary  pathway  intermediates,  which  have  different  concentrations  in  general.  We  have  made  individual  plots  for  each  pathway  length  and  combination 
of  intermediates.  However,  since  the  trends  observed  for  different  pathway  lengths  and  intermediates  are  the  same,  we  show  only  representative  examples. 
(A)  Ratios  of  aggregate  sensitivities  of  concentrations:  a ,  aggregate  sensitivities  of  metabolites  that  have  both  their  production  and  consumption  catalyzed 
by  reversible  reactions;  b  and  c,  aggregate  sensitivities  of  metabolites  that  have  either  their  production  or  consumption  catalyzed  by  an  irreversible  reaction. 
C B )  Ratios  of  aggregate  sensitivities  of  flux:  a,  ratio  for  reference  System  0  relative  to  alternative  Systems  i  (1  <  /  <  n)\  b,  ratio  for  reference  System  0 
relative  to  alternative  System  n;  c,  ratio  for  reference  System  0  relative  to  alternative  System  1.  (C)  Ratios  of  critical  criteria  for  local  stability.  (£>)  Ratios 
of  logarithmic  gains  in  concentration  with  respect  to  changes  in  demand  for  the  end  product:  a ,  ratio  for  reference  System  0  relative  to  alternative  System 
1;  b,  ratio  for  reference  System  0  relative  to  alternative  System  /  ( i  >  !).(£)  Ratios  of  logarithmic  gains  in  flux  with  respect  to  changes  in  demand  for  the 
end  product:  a ,  ratio  for  reference  System  0  relative  to  alternative  Systems  i  (i  >  1);  b,  ratio  for  reference  System  0  relative  to  alternative  System  1.  ( F) 
Ratios  of  transient  times:  a  and  b ,  ratio  for  reference  System  0  relative  to  two  different  alternative  Systems  i  (/  <  n);  c,  ratio  for  reference  System  0  relative 
to  alternative  System  n. 


The  ratios  involving  transient  times  are  inversely  corre¬ 
lated  with  the  concentrations  of  intermediates  (Fig.  2  F). 
Thus,  the  difference  in  transient  times  (criterion  7)  in  Sys¬ 
tem  0  and  Systems  /  (/  <  n)  increases  as  the  concentration 
of  intermediates  decreases,  whereas  this  difference  in  Sys¬ 
tem  0  and  System  n  decreases. 

DISCUSSION 

We  analyzed  the  effect  of  having  an  irreversible  reaction  at 
different  positions  in  an  unbranched  biosynthetic  pathway 
with  all  other  reactions  being  reversible.  We  also  analyzed 
the  effect  of  having  a  reversible  reaction  at  different  posi¬ 
tions  in  pathways  with  all  other  reactions  being  irreversible 
(data  not  shown).  The  results  are  qualitatively  similar; 
namely,  the  best  position  for  the  single  irreversible  reaction 
is  at  the  beginning  of  the  pathway,  whereas  the  best  position 
for  the  single  reversible  reaction  is  at  the  end  of  the  path¬ 


way.  The  method  used  for  our  analysis,  mathematically 
controlled  comparisons,  often  allows  one  to  obtain  symbolic 
(and  thus  general)  results  when  comparing  systemic  prop¬ 
erties  of  alternative  models.  When  this  is  not  possible,  the 
method  also  can  be  used  numerically  to  obtain  results  that  are 
general  in  a  statistical  sense.  Comparisons  were  made  based  on 
functional  effectiveness,  as  judged  by  the  seven  quantitative 
criteria  described  in  detail  in  the  Methods  section. 

In  this  work  we  have  found  a  limited  number  of  symbolic 
comparisons  whose  conclusions  do  not  depend  on  the  spe¬ 
cific  values  of  the  parameters.  The  reference  pathway  with 
all  reactions  fully  reversible  (System  0)  is  more  robust  to 
perturbations  in  the  values  of  the  parameters  (criterion  3) 
than  is  an  otherwise  equivalent  alternative  pathway  with  an 
irreversible  synthesis  of  end  product.  Also,  when  comparing 
reference  System  0  with  alternative  Systems  /  (/  ^  1),  where 
reaction  i  is  irreversible,  the  flux  through  System  0  is  more 
responsive  to  changes  in  the  demand  for  end  product  (cri- 
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tenon  5),  whereas  the  concentrations  of  its  intermediates  are 
less  responsive  (criterion  6).  On  the  other  hand,  the  flux  is 
more  responsive  (criterion  5)  and  the  concentrations  are  less 
responsive  (criterion  6)  to  the  demand  for  end  product  in 
System  1  than  in  System  0.  Taken  together,  these  results 
imply  that  reference  System  0  is  superior  to  alternative 
System  n  on  the  bases  of  criteria  3,  5,  and  6,  superior  to 
alternative  Systems  i  (i  =£  1,  n)  on  the  bases  of  criteria  5  and 
6,  but  inferior  to  alternative  System  1  on  the  bases  of  criteria 
5  and  6.  Not  much  can  be  said  analytically  about  the 
comparison  of  these  systems  based  on  other  criteria. 

Additional  conclusions  that  are  general  in  a  statistical 
sense  can  be  obtained  by  means  of  numerical  comparisons. 
These  indicate  that  the  reference  System  0  is,  on  average, 
better  than  or  similar  to  the  alternative  Systems  i  (i  >  1)  on 
the  bases  of  all  the  criteria  except  transient  time  (criterion 
7).  These  numerical  comparisons  also  indicate  that  the  al¬ 
ternative  System  1  is,  on  average,  better  than  or  similar  to 
the  reference  System  0  on  the  bases  of  all  the  criteria  except 
some  components  of  robustness  (criterion  3).  The  differ¬ 
ences  in  value  for  those  components  that  favor  reference 


System  0  over  alternative  System  1  are  less  significant  when 
the  systems  are  optimized  according  to  criterion  1  than 
when  these  systems  are  not  so  optimized.  Thus,  alternative 
System  1  is,  on  average,  better  than  or  similar  to  all  other 
systems  under  the  following  conditions:  The  concentrations 
of  intermediates  are  equally  low  (criterion  1).  The  logarith¬ 
mic  gains  in  concentration  with  respect  to  change  in  the 
level  of  initial  substrate  also  are  equally  low  (criterion  2). 
The  robustness  of  all  the  intermediates,  with  one  exception, 
is  similar.  Although,  as  noted  above,  the  first  intermediate 
and  the  flux  are  less  robust  in  System  1,  these  differences 
are  less  significant  when  criterion  1  is  satisfied  (criterion  3). 
The  margins  of  stability  are  similar  (criterion  4).  Flux  is 
more  responsive  to  changes  in  demand  for  end  product 
(criterion  5).  Concentrations  of  intermediates  are  less  re¬ 
sponsive  to  changes  in  demand  for  end  product  (criterion  6). 
Transient  times  are  shorter  (criterion  7). 

The  combination  of  analytical  and  numerical  results  pre¬ 
sented  in  this  paper  provides  a  functional  rationale  for  why 
irreversible  reactions  are  found  predominantly  at  the  begin¬ 
nings  of  unbranched  biosynthetic  pathways. 


APPENDIX 

Parametric  constraints  for  external  equivalence.  The  number  of  reactions  in  the  pathway  is  n,  where  n  varies  from  2  to  7.  The  position  of  the  irreversible 
reaction  in  the  pathway  is  «,  where  i  varies  from  1  to  n.  An  extra  constraint.  g\a  =  is  common  to  all  cases  when  the  irreversible  reaction  is  in  the  first 

position,  i.e.,  when  i  =  1. 


/'  =  1 :  log[a[]  =  log[a,]  -  g„  log[a2/a3]/g2i;  g'n  =  8n  +  £n(g32  _  giiVgn 
i  =  2:  log[a2]  =  (g32  log[a2]  -  g2 2  log[a3])/(g32  -  £22);  £21  =  gngvJign  ~  £22) 


n  =  3 


n  =  4 


i  =  1:  log[a[]  =  log[a,]  -  g,i(g32  log[a2/a4]  -  £22  logtas/aiM&ign): 

£13  =  £13  +  £ll(£32£43  _  £22 (£43  “  gttWigugn) 

1  =  2:  log[a2]  =  (£43(£32  log[a2]  -  g22  log[a3])  +  g22g33  log[a4])/(£32£43  ~  £22(£43  ~  £33)); 

£21  =  £2l£32£43/,(£32£43  —  £22^43  ~  £.33)) 

i  =  3:  log[a3]  =  (g43  log[a3]  -  g33  log[a4])/(g43  -  g33);  £32  =  gngdign  ~  £33) 


i  =  1:  log[a!]  =  log[a,]  -  g. 


£32£43  log[a2/a5]  -  g22(£43  ~  £33  log[«4^.s]) 


g21<?32£43 

£(4  =  g]4  +  £ll(£22£33(£54  -  £44)  +  £43£54(£32  -  £22))/(£2l£32£43) 

g43£54(£32  l0g[«2]  ~  £22  l0g[«3l)  +  £22£33(£54  lOgM  ~  £44  iOgM)  . 
i  =  2:  log[a2]  =  £32£43£s4  ~  £22(£43£s4  "  £33(£s4  ~  £44)) 
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821  ~  8218 328 438 54^8 328 438 54  822(843854  ~  833(854  ~~  £44))) 

i  =  3:  log[a3]  =  (g54(g43  log[a3]  -  g33  log[a4])  +  g3,gM  Iog[a5])/(g43g54  -  g33(g54  -  gu)); 
8n  =  8n8 43854/(843854  —  g33(gs4  —  844)) 

i  =  4:  Iog[a4]  =  (g54  log[a4]  -  log[a5])/(gS4  -  g*);  g43  =  £43£54%54  “  gu) 


=  1:  log[a[]  =  log[a,]  —  g. 


(g32g43g54  log[«2/«6]  ~  g22g43g54  l0g[a3/a6]  +  g22g33g54  l0g[«4/a6]  -  g22g33844  l0g[a5/a6]) 


£2l£32£43£54 


8l5  “  #15  +  8l 


^22^33 (g54^65  £44(#65  £55))  +  g43«?54^6s(g32  ~~  £22) 

£2l£32£43#54 


i  =  2:  log[aj] 

_  (g32g43g54g&5  log[«2]  ~  g22g43gS4g65  l0g[a3]  +  g22g33g54g65  logfa]  ~  g22g33g44g(S5  l°g[a5]  +  ^22^33^44^55  10g[o:6]) 
g32g43g54g65  —  g22(g43g54g65  —  g33(g54g65  —  g44(g65  —  gss))) 

821  =  g2lg32g43g54g6s/(g32g43g54g65  —  g2’(g43g54g65  ~  g33(g54g65  ~  g44(g65  —  gss)))) 

.  _  3.  ]q  r  ,-j  _  g43g54g<5  log[a3]  ~  §33854865  lOgK]  +  ^33^44^65  l0g[«j]  ~  g55  lOgK]) 

g43g54g65  —  g33(g54g65  —  g44(g65  —  g5s)) 

5.32  =  8 yi8 428 54g65^(g43g54g65  —  g33(g54g{,5  g 44(^65  —  ^55))) 

i  =  4:  log[aJ]  =  (g54g65  log[a4]  -  g^fes  log[a5]  “  g.v  log[a6]))/(g54g«5  ~  g44(ges  ~  g$s))\ 

843  =  g43g54g65^(g54g65  —  g44(g65  —  855)) 

i  =  5:  log[a£]  =  (g65  log[a5]  -  g55  log[a6])/(g65  -  g55);  g^4  =  g54g65/(g65  -  g55) 


*'=  l:log[a;]  =  Iog[a,]-gl 


g32g43g54g65  10g[a2/a7]  “  g22g43gi4g65  10g[a3/a7]  \ 

+  g22g33g54g65  l0g[a4/a7]  -  g22g33«44g65  log[a5/0!7] 

+  g22g33g44g55  l0g[>(/<*7] _ 

\  g2lg32g43g54g65  / 


8l 6  ~  £l6  +  £l 


£32£43£54£65  te(<?43^54^65  ^33^54^65  ^44^65^76  “  ^55^76  “  £66)))) 


821832843854865 


{ g32g43g54g65g76  log[<*2]  “  g22g43g54g65g76  log[<*3]  +  g22g33gs4g65g76  l0g[a4]  \ 

.  _  „  ,  _r  „  _  \  ~  822822844865826  Iog[a5]  +  8228 228448 55876  log[a«]  -  822823844855866  Iog[a7]/ 

i  —  2:  log[a2J  ~  7 - 7* - r - ; 

/  <^32^43^54^65^76  £22£43£54£65£76  +  ^22^33<?54^65<?76  \ 

\  “  822833844865816  +  822833844^55816  “  ^22^33^44^55^66/ 

, _ 82  l832843854g658l6 _ 

521  {8 228 438 548 65876  -  g22g43gS4g65g76  +  g22g33g54g65g76  ~  g22g33g44gfi5g76  +  g22g33g44g55g76  ~  g22g33g44gS3g66) 

i  =  3:  log[a3] 

_  (g43g54g65g76  ^[<*3]  ~  822854865816  l0g[«4]  +  g33g44g65g?6  10g[«s]  ~  g33g44g55g76  l0g[«6]  +  g33g44g55g66  l0g[a7]) 
<?43<?54^65^76  “  ^33^54^65<?76  +  ^33^44<?65<?76  ”  ^33^44<?55<?76  +  ^33<?44^55<?66 

,  <?32<?43<?54<g65^76 

g32  =  - — — - 

<?43<?54<f)65<?76  ”  <?33^54^65<?76  *+*  <?33^44^65^76  <?33^44<?55^76  +  <?33<?44^55^66 
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&>4£65g76  lOgM  -  g44g65g76  jOgM  +  ^44^55^76  fogM  ~  jfegsggM  l°g[a?]  # 
i  -  4.  log  a4  £54£65£76  “  g44£65£76  +  £44£55£76  “  £44£55£66 

; _ ^43^54^65^76 _ 

^43  g54g6Sgl6  -  gugesgn  +  £44£55g76  ”  ^55^66 

i  =  5:  logM  =  (£65£76  ^g[a5]  -  g55£76  log[a6]  +  g55g66  log[a7])/(g65g76  -  g55g76  +  gssgeeY* 

gU  “  <?54^65^76/(<?65<?76  ”  £55g76  +  ^55^) 

/  =  6:  logfag]  =  (g76  log[o:6]  -  g66  log[a7])/(g76  “  £66);  £es  =  gtsgiJign  ~  £66) 


i=  l:log[aJ]  =  lo  g[a,]-£, 


£  17  =  £  17  +  £i 


/£32£43£54£65£76  ^g[a2/a&]  -  £22£43£54£65£76  log^as]  +  £ 22£ 33£ 54£ 6s£ 76  log[a4/a:8]  \ 

\  -  £22£33£44£65£76  |Og[g^8j  +  g22g33g44g55g76  lOg^^Mi]  ~  £22£33£44£55£66 

£2 1  £  32£  43£  54£  65£76 

(£32£43£54£65£76  ~  £22£43£54£65<g76  +  gllg^gSigteglb  ~  £22£33£44£&5£76  +  £22g33g44g55g76  ~  g22g33g44g55g66 

£2l£32£43£54£65£76 


/£32£43£54£65£76£87  l0g[«2]  "  £22£43£54£65#76£87  ^Og[a3]  \ 

+  £22£33£54£65£76£87  l0g[a4]  “  £22£33£44£65£76£87  l0g[«5]  | 

\  +  £22£33£44£55g76£87  lOgM  ~  g22g33g44g55g66g87  lOgM  +  g22g33g44g55g66g77  *°g M/ 

i  -  2.  10gLa2J  -  /g32g43£54£65g76£87  “  £22£43£54£65£?6£87  +  £22£33£54£65£76g87  \ 

-  £22£33£44£65£76£87  +  £22£33£44£55£?6£87  _  £22£33£44£55£66£87 
\  +  £22£33£44£55£66£77  / 

; _ £2l£32£43£54£65£76£87 _ 

g2X  /£32£43£54£6S£76£87  ~  £22£43£54£65£76£S7  +  £22g33g54£65g76£87  \ 

-  £22£33£44£65£76£87  +  £22£33£44£55£76<g87  ”  £22g33g44£55£66gS7 
\  +  £22£33£44g55£66g77  ' 

/£43£54£65£76^87  l0g[a3]  "  £33£54£65£76£87  l0g[«4]  \ 

+  £33£44£65£76£87  log^]  ~  £33£44£55£76£87  l0g[ae]  I 

_ \  +  £33£44£55£66£87  logfo?]  ~  £33£44£55£66£77  ^g[asV  _ , 

1  =  3:  10g[a3]  =  (g43gj4^65g76<?87  -  g33g54g65g76g87  +  g)3g44g65g?6gS7  ~  §33g44§55g76g87  +  g33g44§55g66§87  ~  gllgugssgMgn)  ’ 

'  _ _ _ _ £32g43g54g65g76g87  _ _ 

832  ~  (§.<3§54g65g76§S7  -  §33g54g65§76g87  +  g33g44g65§76g87  ~  g33g44gS5§76g87  +  g3}gugS5g66gi7  ~  giigvgsigbbgn) 

i  =  4:  log[a£| 

(gSigtfgKgV  l°g[«4]  ~  g44§65g76§87  10g[or5]  +  g44g55g]6gU  l0g[<3:6]  ~  gugssgtogy  log[a7]  +  g44§55§66g77  log[a8]) 
g54g65g76§87  ~  §44g65g76g87  +  ^44^55^76^87  ~  g44§55g66§87  +  §44§55g66§77 

f  _ g43g54g65g76g87  _ 

^43  §54§65S76§87  _  §44g65g76§87  +  g44§55g76§87  ~  g44§55g66§87  +  g44§55§66g77 

_  g65§76§87  lQg[as]  ~  g55§76§87  lOgE^e]  +  §55§66§87  log[tt7]  ~  gugeign  logE^a] 
i  -  5.  lOgLttjJ  §65§76§87  _  §55§76§87  +  §55§66§87  ~  §55§66§77 

§54  =  §54§65§76§87/(§65§76§87  ~  §55§76§87  +  §55§66§87  “  §55§66§77> 
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/  =  6:  logK]  =  (g16g„  log[a6]  -  g66g„  log[a7]  +  g66gn  log[a8])/(g76#87  -  g^-,  +  g(*gn)\ 
865  =  8658l68zlKgl68si  —  g66g»l  +  gifjgn) 

i  =  7:  log[of7]  =  (g87  log[a7]  -  g77  log[a8])/(g87  -  g77);  g'lt  =  g76g87/(g87  -  g77) 
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